
PRODUCTION OF 1,3 PROPANEDIOL 



(iii) NUMBER OF SEQUENCES: 68 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Genencor International, Inc. 

(B) STREET: 4 Cambridge Place 

1870 South Winton road 

(C) CITY: Rochester 

(D) . STATE: NY 

(E) COUNTRY : U.S. A 

(F) ZIP: 14618 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: Windows 

(D) SOFTWARE: FastSEQ for Windows Version 2.0b 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/969,683 

(B) FILING DATE: 13 -NOV- 1997 

( C ) CL AS S I F I CAT I ON : 

(vii) PRIOR APPLICATION DATA: 




(A) APPLICATION NUMBER: 60/030,601 

(B) FILING DATE: 13-NOV-1996 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Glaister, Debra 

(B) REGISTRATION NUMBER: 33,888 

(C) REFERENCE/DOCKET NUMBER: GC 3 69-2 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-864-7620 

(B) TELEFAX: 650-845-6504 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1668 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHABI 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

ATGAAAAGAT CAAAACGATT TGCAGTACTG GCCCAGCGCC CCGTCAATCA GGACGGGCTG 60 

ATTGGCGAGT GGCCTGAAGA GGGGCTGATC GCCATGGACA GCCCCTTTGA CCCGGTCTCT 12 0 

TCAGTAAAAG TGGACAACGG TCTGATCGTC GAACTGGACG GCAAACGCCG GGACCAGTTT 18 0 

GACATGATCG ACCGATTTAT CGCCGATTAC GCGATCAACG TTGAGCGCAC AGAGCAGGCA 24 0 

ATGCGCCTGG AGGCGGTGGA AATAGCCCGT ATGCTGGTGG ATATTCACGT CAGCCGGGAG 3 00 

GAGATCATTG CCATCACTAC CGCCATCACG CCGGCCAAAG CGGTCGAGGT GATGGCGCAG 3 60 

ATGAACGTGG TGGAGATGAT GATGGCGCTG CAGAAGATGC GTGCCCGCCG GACCCCCTCC 42 0 

AACCAGTGCC ACGTCACCAA TCTCAAAGAT AATCCGGTGC AGATTGCCGC TGACGCCGCC 4 80 

GAGGCCGGGA TCCGCGGCTT CTCAGAACAG GAGACCACGG TCGGTATCGC GCGCTACGCG 540 

CCGTTTAACG CCCTGGCGCT GTTGGTCGGT TCGCAGTGCG GCCGCCCCGG CGTGTTGACG 6 00 

CAGTGCTCGG TGGAAGAGGC CACCGAGCTG GAGCTGGGCA TGCGTGGCTT AACCAGCTAC 660 

GCCGAGACGG TGTCGGTCTA CGGCACCGAA GCGGTATTTA CCGACGGCGA TGATACGCCG 72 0 

TGGTCAAAGG CGTTCCTCGC CTCGGCCTAC GCCTCCCGCG GGTTGAAAAT GCGCTACACC 780 

TCCGGCACCG GATCCGAAGC GCTGATGGGC TATTCGGAGA GCAAGTCGAT GCTCTACCTC 84 0 

GAATCGCGCT GCATCTTCAT TACTAAAGGC GCCGGGGTTC AGGGACTGCA AAACGGCGCG 900 

GTGAGCTGTA TCGGCATGAC CGGCGCTGTG CCGTCGGGCA TTCGGGCGGT GCTGGCGGAA 960 

AACCTGATCG CCTCTATGCT CGACCTCGAA GTGGCGTCCG CCAACGACCA GACTTTCTCC 1020 

CACTCGGATA TTCGCCGCAC CGCGCGCACC CTGATGCAGA TGCTGCCGGG CACCGACTTT 1080 

ATTTTCTCCG GCTACAGCGC GGTGCCGAAC TACGACAACA TGTTCGCCGG CTCGAACTTC 1140 

GATGCGGAAG ATTTTGATGA TTACAACATC CTGCAGCGTG ACCTGATGGT TGACGGCGGC 1200 

CTGCGTCCGG TGACCGAGGC GGAAACCATT GCCATTCGCC AGAAAGCGGC GCGGGCGATC 1260 

CAGGCGGTTT TCCGCGAGCT GGGGCTGCCG CCAATCGCCG ACGAGGAGGT GGAGGCCGCC 132 0 

ACCTACGCGC ACGGCAGCAA CGAGATGCCG CCGCGTAACG TGGTGGAGGA TCTGAGTGCG 1380 



GTGGAAGAGA TGATGAAGCG CAACATCACC GGCCTCGATA TTGTCGGCGC GCTGAGCCGC 144 0 

AGCGGCTTTG AGGATATCGC CAGCAATATT CTCAATATGC TGCGCCAGCG GGTCACCGGC 1500 

GATTACCTGC AGACCTCGGC CATTCTCGAT CGGCAGTTCG AGGTGGTGAG TGCGGTCAAC 1560 

GACATCAATG ACTATCAGGG GCCGGGCACC GGCTATCGCA TCTCTGCCGA ACGCTGGGCG 162 0 

GAG AT C AAAA ATATTCCGGG CGTGGTTCAG CCCGACACCA TTGAATAA 1668 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

GTGCAACAGA CAACCCAAAT TCAGCCCTCT TTTACCCTGA AAACCCGCGA GGGCGGGGTA 60 

GCTTCTGCCG ATGAACGCGC CGATGAAGTG GTGATCGGCG TCGGCCCTGC CTTCGATAAA 12 0 

CACCAGCATC ACACTCTGAT CGATATGCCC CATGGCGCGA TCCTCAAAGA GCTGATTGCC 180 

GGGGTGGAAG AAGAGGGGCT TCACGCCCGG GTGGTGCGCA TTCTGCGCAC GTCCGACGTC 240 

TCCTTTATGG CCTGGGATGC GGCCAACCTG AGCGGCTCGG GGATCGGCAT CGGTATCCAG 300 

TCGAAGGGGA CCACGGTCAT CCATCAGCGC GATCTGCTGC CGCTCAGCAA CCTGGAGCTG 360 

TTCTCCCAGG CGCCGCTGCT GACGCTGGAG ACCTACCGGC AGATTGGCAA AAACGCTGCG 42 0 

CGCTATGCGC GCAAAGAGTC ACCTTCGCCG GTGCCGGTGG TGAACGATCA GATGGTGCGG 480 

CCGAAATTTA TGGCCAAAGC CGCGCTATTT CATATCAAAG AGACCAAACA TGTGGTGCAG 54 0 

GACGCCGAGC CCGTCACCCT GCACATCGAC TTAGTAAGGG AGTGA 585 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 



(A) ORGANISM : DHAB3 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATGAGCGAGA AAACCATGCG CGTGCAGGAT TATCCGTTAG CCACCCGCTG CCCGGAGCAT 60 

ATCCTGACGC CTACCGGCAA AC C ATTG AC C GATATTACCC TCGAGAAGGT GCTCTCTGGC 12 0 

GAGGTGGGCC CGCAGGATGT GCGGATCTCC CGCCAGACCC TTGAGTACCA GGCGCAGATT 180 

GCCGAGCAGA TGCAGCGCCA TGCGGTGGCG CGCAATTTCC GCCGCGCGGC GGAGCTTATC 24 0 

GCCATTCCTG ACGAGCGCAT TCTGGCTATC TATAACGCGC TGCGCCCGTT CCGCTCCTCG 3 00 

CAGGCGGAGC TGCTGGCGAT CGCCGACGAG CTGGAGCACA CCTGGCATGC GACAGTGAAT 360 

GCCGCCTTTG TCCGGGAGTC GGCGGAAGTG TATCAGCAGC GGCATAAGCT GCGTAAAGGA 42 0 

AGCTAA 42 6 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAT 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

ATGAGCTATC GTATGTTTGA TTATCTGGTG CCAAACGTTA ACTTTTTTGG CCCCAACGCC 60 

ATTTCCGTAG TCGGCGAACG CTGCCAGCTG CTGGGGGGGA AAAAAGCCCT GCTGGTCACC 12 0 

GACAAAGGCC TGCGGGCAAT TAAAGATGGC GCGGTGGACA AAACCCTGCA TTATCTGCGG 180 

GAGGCCGGGA TCGAGGTGGC GATCTTTGAC GGCGTCGAGC CGAACCCGAA AGACACCAAC 24 0 

GTGCGCGACG GCCTCGCCGT GTTTCGCCGC GAACAGTGCG ACATCATCGT CACCGTGGGC 300 

GGCGGCAGCC CGCACGATTG CGGCAAAGGC ATCGGCATCG CCGCCACCCA TGAGGGCGAT 360 

CTGTACCAGT ATGCCGGAAT CGAGACCCTG ACCAACCCGC TGCCGCCTAT CGTCGCGGTC 42 0 

AATACCACCG CCGGCACCGC CAGCGAGGTC ACCCGCCACT GCGTCCTGAC CAACACCGAA 480 

ACCAAAGTGA AGTTTGTGAT CGTCAGCTGG CGCAAACTGC CGTCGGTCTC TATCAACGAT 54 0 

CCACTGCTGA TGATCGGTAA ACCGGCCGCC CTGACCGCGG CGACCGGGAT GGATGCCCTG 600 

ACCCACGCCG TAGAGGCCTA TATCTCCAAA GACGCTAACC CGGTGACGGA CGCCGCCGCC 660 



t\ „1 

ATGCAGGCGA TCCGCCTCAT CGCCCGCAAC CTGCGCCAGG CCGTGGCCCT CGGCAGCAAT 72 0 

CTGCAGGCGC GGGAAAACAT GGCCTATGCT TCTCTGCTGG CCGGGATGGC TTTCAATAAC 780 

GCCAACCTCG GCTACGTGCA CGCCATGGCG CACCAGCTGG GCGGCCTGTA CGACATGCCG 84 0 

CACGGCGTGG CCAACGCTGT CCTGCTGCCG CATGTGGCGC GCTACAACCT GATCGCCAAC 900 

CCGGAGAAAT TCGCCGATAT CGCTGAACTG ATGGGCGAAA ATATCACCGG ACTGTCCACT 960 

CTCGACGCGG CGGAAAAAGC CATCGCCGCT ATCACGCGTC TGTCGATGGA TATCGGTATT 102 0 

CCGCAGCATC TGCGCGATCT GGGGGTAAAA GAGGCCGACT TCCCCTACAT GGCGGAGATG 1080 

GCTCTAAAAG ACGGCAATGC GTTCTCGAAC CCGCGTAAAG GCAACGAGCA GGAGATTGCC 114 0 

GCGATTTTCC GCCAGGCATT CTGA 1164 
(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

CTTTAATTTT CTTTTATCTT ACTCTCCTAC ATAAGACATC AAGAAACAAT TGTATATTGT 60 

ACACCCCCCC CCTCCACAAA CACAAATATT GATAATATAA AGATGTCTGC TGCTGCTGAT 12 0 - 

AGATTAAACT TAACTTCCGG CCACTTGAAT GCTGGTAGAA AGAGAAGTTC CTCTTCTGTT 180 

TCTTTGAAGG CTGCCGAAAA GCCTTTCAAG GTTACTGTGA TTGGATCTGG TAACTGGGGT 24 0 

ACTACTATTG CCAAGGTGGT TGCCGAAAAT TGTAAGGGAT ACCCAGAAGT TTTCGCTCCA 3 00 

ATAGTACAAA TGTGGGTGTT CGAAGAAGAG ATCAATGGTG AAAAATTGAC TGAAATCATA 360 

AATACTAGAC ATCAAAACGT GAAATACTTG CCTGGCATCA CTCTACCCGA CAATTTGGTT 42 0 

GCTAATCCAG ACTTGATTGA TTCAGTCAAG GATGTCGACA TCATCGTTTT CAACATTCCA 4 80 

CATCAATTTT TGCCCCGTAT CTGTAGCCAA TTGAAAGGTC ATGTTGATTC ACACGTCAGA 54 0 

GCTATCTCCT GTCTAAAGGG TTTTGAAGTT GGTGCTAAAG GTGTCCAATT GCTATCCTCT 600 

TACATCACTG AGGAACTAGG TATTCAATGT GGTGCTCTAT CTGGTGCTAA CATTGCCACC 660 

GAAGTCGCTC AAGAACACTG GTCTGAAACA ACAGTTGCTT ACCACATTCC AAAGGATTTC 72 0 



AGAGGCGAGG GCAAGGACGT CGACCATAAG GTTCTAAAGG CCTTGTTCCA CAGACCTTAC 780 

TTCCACGTTA GTGTCATCGA AGATGTTGCT GGTATCTCCA TCTGTGGTGC TTTGAAGAAC 84 0 

GTTGTTGCCT TAGGTTGTGG TTTCGTCGAA GGTCTAGGCT GGGGTAACAA CGCTTCTGCT 900 

GCCATCCAAA GAGTCGGTTT GGGTGAGATC ATCAGATTCG GTCAAATGTT TTTCCCAGAA 96 0 

TCTAGAGAAG AAACATACTA CCAAGAGTCT GCTGGTGTTG CTGATTTGAT CACCACCTGC 102 0 

GCTGGTGGTA GAAACGTCAA GGTTGCTAGG CTAATGGCTA CTTCTGGTAA GGACGCCTGG 10 8 0 

GAATGTGAAA AGGAGTTGTT GAATGGCCAA TCCGCTCAAG GTTTAATTAC CTGCAAAGAA 1140 

GTTCACGAAT GGTTGGAAAC ATGTGGCTCT GTCGAAGACT TCCCATTATT TGAAGCCGTA 12 0 0 

TACCAAATCG TTTACAACAA CTACCCAATG AAGAACCTGC CGGACATGAT TGAAGAATTA 12 60 

GATCTACATG AAGATTAGAT TTATTGGAGA AAGATAACAT ATCATACTTC CCCCACTTTT 132 0 

TTCGAGGCTC TTCTATATCA T ATT CAT AAA TTAGCATTAT GTCATTTCTC ATAACTACTT 13 8 0 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2946 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

GAATTCGAGC CTGAAGTGCT GATTACCTTC AGGTAGACTT CATCTTGACC CATCAACCCC 60 

AGCGTCAATC CTGCAAATAC ACCACCCAGC AGCACTAGGA TGATAGAGAT AATATAGTAC 12 0 

GTGGTAACGC TTGCCTCATC ACCTACGCTA TGGCCGGAAT CGGCAACATC CCTAGAATTG 18 0 

AGTACGTGTG ATCCGGATAA CAACGGCAGT GAATATATCT TCGGTATCGT AAAGATGTGA 24 0 

TATAAGATGA TGTATACCCA ATGAGGAGCG CCTGATCGTG ACCTAGACCT TAGTGGCAAA 300 

AACGACATAT CTATTATAGT GGGGAGAGTT TCGTGCAAAT AACAGACGCA GCAGCAAGTA 3 60 

ACTGTGACGA TATCAACTCT TTTTTTATTA TGTAATAAGC AAACAAGCAC GAATGGGGAA 42 0 

AGCCTATGTG CAATC AC CAA GGTCGTCCCT TTTTTCCCAT TTGCTAATTT AGAATTTAAA 4 80 

GAAACCAAAA GAATGAAGAA AGAAAACAAA TACTAGCCCT AACCCTGACT TCGTTTCTAT 54 0 

GATAATACCC TGCTTTAATG AACGGTATGC CCTAGGGTAT ATCTCACTCT GTACGTTACA 60 0 



AACTCCGGTT ATTTTATCGG AACATCCGAG CACCCGCGCC TTCCTCAACC CAGGCACCGC 660 

CCCAGGTAAC CGTGCGCGAT GAGCTAATCC TGAGCCATCA CCCACCCCAC CCGTTGATGA 720 

CAGCAATTCG GGAGGGCGAA AATAAAACTG GAGCAAGGAA TTACCATCAC CGTCACCATC 780 

ACCATCATAT CGCCTTAGCC TCTAGCCATA GCCATCATGC AAGCGTGTAT CTTCTAAGAT 840 

TCAGTCATCA TCATTACCGA GTTTGTTTTC CTTCACATGA TGAAGAAGGT TTGAGTATGC 900 

TCGAAACAAT AAGACGACGA TGGCTCTGCC ATTGGTTATA TTACGCTTTT GCGGCGAGGT 960 

GCCGATGGGT TGCTGAGGGG AAGAGTGTTT AGCTTACGGA CCTATTGCCA TTGTTATTCC 1020 

GATTAATCTA TTGTTCAGCA GCTCTTCTCT ACCCTGTCAT TCTAGTATTT TTTTTTTTTT 1080 

TTTTTGGTTT TACTTTTTTT TCTTCTTGCC TTTTTTTCTT GTTACTTTTT TTCTAGTTTT 1140 

TTTTCCTTCC ACTAAGCTTT TTCCTTGATT TATCCTTGGG TTCTTCTTTC TACTCCTTTA 12 00 

GATTTTTTTT TTATATATTA ATTTTTAAGT TTATGTATTT TGGTAGATTC AATTCTCTTT 12 60 

CCCTTTCCTT TTCCTTCGCT CCCCTTCCTT ATCAATGCTT GCTGTCAGAA GATTAACAAG 1320 

ATACACATTC CTTAAGCGAA CGCATCCGGT GTTATATACT CGTCGTGCAT ATAAAATTTT 13 80 

GCCTTCAAGA TCTACTTTCC TAAGAAGATC ATTATTACAA ACACAACTGC ACTCAAAGAT 144 0 

GACTGCTCAT ACTAATATCA AACAGCACAA ACACTGTCAT GAGGACCATC CTATCAGAAG 1500 

ATCGGACTCT GCCGTGTCAA TTGTACATTT GAAACGTGCG CCCTTCAAGG TTACAGTGAT 15 60 

TGGTTCTGGT AACTGGGGGA CCACCATCGC CAAAGTCATT GCGGAAAACA CAGAATTGCA 1620 

TTCCCATATC TTCGAGCCAG AGGTGAGAAT GTGGGTTTTT GATGAAAAGA TCGGCGACGA 16 80 

AAATCTGACG GATATCATAA ATACAAGACA CCAGAACGTT AAATATCTAC CCAATATTGA 174 0 

CCTGCCCCAT AATCTAGTGG CCGATCCTGA TCTTTTACAC TCCATCAAGG GTGCTGACAT 18 00 

CCTTGTTTTC AACATCCCTC ATCAATTTTT ACCAAACATA GTCAAACAAT TGCAAGGCCA 18 60 

CGTGGCCCCT CATGTAAGGG CCATCTCGTG TCTAAAAGGG TTCGAGTTGG GCTCCAAGGG 1920 

TGTGCAATTG CTATCCTCCT ATGTTACTGA TGAGTTAGGA ATCCAATGTG GCGCACTATC 1980 

TGGTGCAAAC TTGGCACCGG AAGTGGCCAA GGAGCATTGG TCCGAAACCA CCGTGGCTTA 2 040. 

CCAACTACCA AAGGATTATC AAGGTGATGG CAAGGATGTA GATCATAAGA TTTTGAAATT 2100 

GCTGTTCCAC AGACCTTACT TCCACGTCAA TGTCATCGAT GATGTTGCTG GTATATCCAT 2160 

TGCCGGTGCC TTGAAGAACG TCGTGGCACT TGCATGTGGT TTCGTAGAAG GTATGGGATG 22 20 

GGGTAACAAT GCCTCCGCAG CCATTCAAAG GCTGGGTTTA GGTGAAATTA TCAAGTTCGG 22 80 



I* 

TAGAATGTTT TTCCCAGAAT CCAAAGTCGA GACCTACTAT CAAGAATCCG CTGGTGTTGC 2 340 

AGATCTGATC ACCACCTGCT CAGGCGGTAG AAACGTCAAG GTTGCCACAT ACATGGCCAA 24 00 

GACCGGTAAG TCAGCCTTGG AAGCAGAAAA GGAATTGCTT AACGGTCAAT CCGCCCAAGG 24 60 

GATAATCACA TGCAGAGAAG TTCACGAGTG GCTACAAACA TGTGAGTTGA CCCAAGAATT 2520 

CCCAATTATT CGAGGCAGTC TACCAGATAG TCTACAACAA CGTCCGCATG GAAGACCTAC 2580 

CGGAGATGAT TGAAGAGCTA GACATCGATG ACGAATAGAC ACTCTCCCCC CCCCTCCCCC 2 640 

TCTGATCTTT CCTGTTGCCT CTTTTTCCCC CAACCAATTT ATCATTATAC ACAAGTTCTA 27 00 

CAACTACTAC TAGTAACATT ACTACAGTTA TTATAATTTT CTATTCTCTT TTTCTTTAAG 2760 

AATCTATCAT TAACGTTAAT TTC TATATAT AC AT AAC TAC CATTATACAC GCTATTATCG 2 820 

TTTACATATC ACATCACCGT TAATGAAAGA TACGACACCC TGTACACTAA CACAATTAAA 2 8 80 

TAATCGCCAT AACCTTTTCT GTTATCTATA GCCCTTAAAG CTGTTTCTTC GAGCTTTTCA 2940 

CTGCAG 2 946 
(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 317 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT 2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

CTGCAGAACT TCGTCTGCTC TGTGCCCATC CTCGCGGTTA GAAAGAAGCT GAATTGTTTC 60 

ATGCGCAAGG GCATCAGCGA GTGACCAATA ATCACTGCAC TAATTCCTTT TTAGCAACAC 12 0 

ATACTTATAT ACAGCACCAG ACCTTATGTC TTTTCTCTGC TCCGATACGT TATCCCACCC 180 

AACTTTTATT TCAGTTTTGG CAGGGGAAAT TTCACAACCC CGCACGCTAA AAATCGTATT 240 

TAAACTTAAA AGAGAACAGC CACAAATAGG GAACTTTGGT CTAAACGAAG GACTCTCCCT 3 00 

CCCTTATCTT GACCGTGCTA TTGCCATCAC TGCTACAAGA CTAAATACGT ACTAATATAT 3 60 

GTTTTCGGTA ACGAGAAGAA GAGCTGCCGG TGCAGCTGCT GCCATGGCCA CAGCCACGGG 420 

GACGCTGTAC TGGATGACTA GCCAAGGTGA TAGGCCGTTA GTGCACAATG ACCCGAGCTA 4 80 

CATGGTGCAA TTCCCCACCG CCGCTCCACC GGCAGGTCTC TAGACGAGAC CTGCTGGACC 540 



GTCTGGACAA GACGCATCAA TTCGACGTGT TGATCATCGG TGGCGGGGCC ACGGGGACAG 600 

GATGTGCCCT AGATGCTGCG ACCAGGGGAC TCAATGTGGC CCTTGTTGAA AAGGGGGATT 660 

TTGCCTCGGG AACGTCGTCC AAATCTACCA AGATGATTCA CGGTGGGGTG CGGTACTTAG 72 0 

AGAAGGCCTT CTGGGAGTTC TCCAAGGCAC AACTGGATCT GGTCATCGAG GCACTCAACG 7 80 

AGCGTAAACA TCTTATCAAC ACTGCCCCTC ACCTGTGCAC GGTGCTACCA ATTCTGATCC 84 0 

CCATCTACAG CACCTGGCAG GTCCCGTACA TCTATATGGG CTGTAAATTC TACGATTTCT 900 

TTGGCGGTTC CCAAAACTTG AAAAAATCAT ACCTACTGTC CAAATCCGCC ACCGTGGAGA 960 

AGGCTCCCAT GCTTACCACA GACAATTTAA AGGCCTCGCT TGTGTAC CAT GATGGGTCCT 102 0 

TTAACGACTC GCGTTTGAAC GCCACTTTAG CCATCACGGG TGTGGAGAAC GGCGCTACCG 10 8 0 

TCTTGATCTA TGTCGAGGTA CAAAAATTGA TCAAAGACCC AACTTCTGGT AAGGTTATCG 114 0 

GTGCCGAGGC CCGGGACGTT GAGACTAATG AGCTTGTCAG AATCAACGCT AAATGTGTGG 12 0 0 

TCAATGCCAC GGGCCCATAC AGTGACGCCA TTTTGCAAAT GGACCGCAAC CCATCCGGTC 12 60 

TGCCGGACTC CCCGCTAAAC GACAACTCCA AGATCAAGTC GACTTTCAAT CAAATCTCCG 132 0 

TCATGGACCC GAAAATGGTC ATCCCATCTA TTGGCGTTCA CATCGTATTG CCCTCTTTTT 13 8 0 

ACTCCCCGAA GGATATGGGT TTGTTGGACG TCAGAACCTC TGATGGCAGA GTGATGTTCT 144 0 

TTTTACCTTG GCAGGGCAAA GTCCTTGCCG GC AC C AC AG A CATCCCACTA AAGCAAGTCC 15 0 0 

CAGAAAACCC TATGCCTACA GAGGCTGATA TTCAAGATAT CTTGAAAGAA CTACAGCACT 1560 

ATATCGAATT CCCCGTGAAA AGAGAAGACG TGCTAAGTGC ATGGGCTGGT GTCAGACCTT 162 0 

TGGTCAGAGA TCCACGTACA ATCCCCGCAG ACGGGAAGAA GGGCTCTGCC ACTCAGGGCG 168 0 

TGGTAAGATC CCACTTCTTG TTCACTTCGG ATAATGGCCT AATTACTATT GCAGGTGGTA 174 0 

AATGGACTAC TTACAGACAA ATGGCTGAGG AAACAGTCGA CAAAGTTGTC GAAGTTGGCG 18 0 0 

GATTCCACAA CCTGAAACCT TGTCACACAA GAGATATTAA GCTTGCTGGT GCAGAAGAAT 18 60 

GGACGCAAAA CTATGTGGCT TTATTGGCTC AAAACTACCA TTTATCATCA AAAATGTCCA 192 0 

ACTACTTGGT TCAAAACTAC GGAACCCGTT CCTCTATCAT TTGCGAATTT TTCAAAGAAT 198 0 

CCATGGAAAA TAAACTGCCT TTGTCCTTAG CCGACAAGGA AAATAACGTA ATCTACTCTA 204 0 

GCGAGGAGAA CAACTTGGTC AATTTTGATA CTTTCAGATA TCCATTCACA ATCGGTGAGT 210 0 

TAAAGTATTC CATGCAGTAC GAATATTGTA GAACTCCCTT GGACTTCCTT TTAAGAAGAA 216 0 

CAAGATTCGC CTTCTTGGAC GCCAAGGAAG CTTTGAATGC CGTGCATGCC ACCGTCAAAG 22 2 0 

TTATGGGTGA TGAGTTCAAT TGGTCGGAGA AAAAGAGGCA GTGGGAACTT GAAAAAACTG 22 8 0 



TGAACTTCAT CCAAGGACGT TTCGGTGTCT AAATCGATCA TGATAGTTAA GGGTGACAAA 2340 

GATAACATTC ACAAGAGTAA TAATAATGGT AATGATGATA ATAATAATAA TGATAGTAAT 24 00 

AACAATAATA ATAATGGTGG TAATGGCAAT GAAATCGCTA TTATTACCTA TTTTCCTTAA 2460 

TGGAAGAGTT AAAGTAAACT AAAAAAACTA CAAAAATATA TGAAGAAAAA AAAAAAAAGA 252 0 

GGTAATAGAC TCTACTACTA CAATTGATCT TCAAATTATG ACCTTCCTAG TGTTTATATT 258 0 

CTATTTCCAA TACATAATAT AATCTATATA ATCATTGCTG GTAGACTTCC GTTTTAATAT 2640 

CGTTTTAATT ATCCCCTTTA TCTCTAGTCT AGTTTTATCA TAAAATATAG AAACACTAAA 2700 

TAATATTCTT CAAACGGTCC TGGTGCATAC GCAATACATA TTTATGGTGC AAAAAAAAAA 2760 

ATGGAAAATT TTGCTAGTCA TAAACCCTTT CATAAAACAA TACGTAGACA TCGCTACTTG 2 82 0 

AAATTTTCAA GTTTTTATCA GATCCATGTT TCCTATCTGC CTTGACAACC TCATCGTCGA 2 88 0 

AATAGTACCA TTTAGAACGC CCAATATTCA CATTGTGTTC AAGGTCTTTA TTCACCAGTG 2 94 0 

ACGTGTAATG GCCATGATTA ATGTGCCTGT ATGGTTAACC ACTCCAAATA GCTTATATTT 3 000 

CATAGTGTCA TTGTTTTTCA ATATAATGTT TAGTATCAAT GGATATGTTA CGACGGTGTT 3060 

ATTTTTCTTG GTCAAATCGT AATAAAATCT CGATAAATGG ATGACTAAGA TTTTTGGTAA 312 0 

AGTTACAAAA TTTATCGTTT TCACTGTTGT CAATTTTTTG TTCTTGTAAT CACTCGAG 3178 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 816 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

ATGAAACGTT TCAATGTTTT AAAATATATC AGAACAACAA AAGCAAATAT ACAAACCATC 60 

GCAATGCCTT TGACCACAAA ACCTTTATCT TTGAAAATCA ACGCCGCTCT ATTCGATGTT 12 0 

GACGGTACCA TCATCATCTC TCAACCAGCC ATTGCTGCTT TCTGGAGAGA TTTCGGTAAA 180 

GACAAGCCTT ACTTCGATGC CGAACACGTT ATTCACATCT CTCACGGTTG GAGAACTTAC 24 0 

GATGCCATTG CCAAGTTCGC TCCAGACTTT GCTGATGAAG AATACGTTAA CAAGCTAGAA 3 00 

GGTGAAATCC CAGAAAAGTA CGGTGAACAC TCCATCGAAG TTCCAGGTGC TGTCAAGTTG 360 



TGTAATGCTT TGAACGCCTT GCCAAAGGAA AAATGGGCTG TCGCCACCTC TGGTACCCGT 42 0 

GACATGGCCA AGAAATGGTT CGACATTTTG AAGATCAAGA GACCAGAATA CTTCATCACC 4 80 

GCCAATGATG TCAAGCAAGG TAAGCCTCAC CCAGAACCAT ACTTAAAGGG TAGAAACGGT 540 

TTGGGTTTCC CAATTAATGA ACAAGACCCA TCCAAATCTA AGGTTGTTGT CTTTGAAGAC 600 

GCACCAGCTG GTATTGCTGC TGGTAAGGCT GCTGGCTGTA AAATCGTTGG TATTGCTACC 660 

ACTTTCGATT TGGACTTCTT GAAGGAAAAG GGTTGTGACA TCATTGTCAA GAACCACGAA 72 0 

TCTATCAGAG TCGGTGAATA CAACGCTGAA ACCGATGAAG TCGAATTGAT CTTTGATGAC 780 

TACTTATACG CTAAGGATGA CTTGTTGAAA TGGTAA 816 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 753 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

ATGGGATTGA CTACTAAACC TCTATCTTTG AAAGTTAACG CCGCTTTGTT CGACGTCGAC 60 

GGTACCATTA TCATCTCTCA ACCAGCCATT GCTGCATTCT GGAGGGATTT CGGTAAGGAC 12 0 

AAACCTTATT TCGATGCTGA ACACGTTATC CAAGTCTCGC ATGGTTGGAG AACGTTTGAT 180 

GCCATTGCTA AGTTCGCTCC AGACTTTGCC AATGAAGAGT ATGTTAACAA ATTAGAAGCT 240 

GAAATTCCGG TCAAGTACGG TGAAAAATCC ATTGAAGTCC CAGGTGCAGT TAAGCTGTGC 3 00 

AACGCTTTGA ACGCTCTACC AAAAGAGAAA TGGGCTGTGG CAACTTCCGG TACCCGTGAT 3 60 

ATGGCACAAA AATGGTTCGA GCATCTGGGA ATCAGGAGAC CAAAGTACTT CATTACCGCT 42 0 

AATGATGTCA AACAGGGTAA GCCTCATCCA GAACCATATC TGAAGGGCAG GAATGGC TTA 4 80 

GGATATCCGA TCAATGAGCA AGACCCTTCC AAATCTAAGG TAGTAGTATT TGAAGACGCT 54 0 

CCAGCAGGTA TTGCCGCCGG AAAAGCCGCC GGTTGTAAGA TCATTGGTAT TGCCACTACT 600 

TTCGACTTGG ACTTCCTAAA GGAAAAAGGC TGTGACATCA TTGTCAAAAA CCACGAATCC 660 

ATCAGAGTTG GCGGCTACAA TGCCGAAACA GACGAAGTTG AATTCATTTT TGACGACTAC 72 0 

TTATATGCTA AGGACGATCT GTTGAAATGG TAA 753 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 52 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TGTATTGGCC ACGATAACCA CCCTTTGTAT ACTGTTTTTG TTTTTCACAT GGTAAATAAC 60 

GACTTTTATT AAACAACGTA TGTAAAAACA TAACAAGAAT CTACCCATAC AGGCCATTTC 120 

GTAATTCTTC TCTTCTAATT GGAGTAAAAC CATCAATTAA AGGGTGTGGA GTAGCATAGT 180 

GAGGGGCTGA CTGCATTGAC AAAAAAATTG AAAAAAAAAA AGGAAAAGGA AAGGAAAAAA 240 

AGACAGCCAA GACTTTTAGA ACGGATAAGG TGTAATAAAA TGTGGGGGGA TGCCTGTTCT 300 

CGAACCATAT AAAATATACC ATGTGGTTTG AGTTGTGGCC GGAACTATAC AAATAGTTAT 360 

ATGTTTCCCT CTCTCTTCCG ACTTGTAGTA TTCTCCAAAC GTTACATATT CCGATCAAGC 42 0 

CAGCGCCTTT ACACTAGTTT AAAACAAGAA CAGAGCCGTA TGTCCAAAAT AATGGAAGAT 480 

TTACGAAGTG ACTACGTCCC GCTTATCGCC AGTATTGATG TAGGAACGAC CTCATCCAGA 540 

TGCATTCTGT TCAACAGATG GGGCCAGGAC GTTTCAAAAC ACCAAATTGA ATATTCAACT 600 

TCAGCATCGA AGGGCAAGAT TGGGGTGTCT GGCCTAAGGA GACCCTCTAC AGCCCCAGCT 660 

CGTGAAACAC CAAACGCCGG TGACATCAAA ACCAGCGGAA AGCCCATCTT TTCTGCAGAA 720 

GGCTATGCCA TTCAAGAAAC CAAATTCCTA AAAATCGAGG AATTGGACTT GGACTTCCAT 780 

AACGAACCCA CGTTGAAGTT CCCCAAACCG GGTTGGGTTG AGTGCCATCC GCAGAAATTA 840 

CTGGTGAACG TCGTCCAATG CCTTGCCTCA AGTTTGCTCT CTCTGCAGAC TATCAACAGC 900 

GAACGTGTAG CAAACGGTCT CCCACCTTAC AAGGTAATAT GCATGGGTAT AGCAAACATG 960 

AGAGAAACCA CAATTCTGTG GTCCCGCCGC ACAGGAAAAC CAATTGTTAA CTACGGTATT 1020 

GTTTGGAACG ACACCAGAAC GATCAAAATC GTTAGAGACA AATGGCAAAA CACTAGCGTC 1080 

GATAGGCAAC TGCAGCTTAG ACAGAAGACT GGATTGCCAT TGCTCTCCAC GTATTTCTCC 1140 

TGTTCCAAGC TGCGCTGGTT CCTCGACAAT GAGCCTCTGT GTACCAAGGC GTATGAGGAG 12 00 

AACGACCTGA TGTTCGGCAC TGTGGACACA TGGCTGATTT ACCAATTAAC TAAACAAAAG 12 60 



GCGTTCGTTT CTGACGTAAC CAACGCTTCC AGAACTGGAT TTATGAACCT CTCCACTTTA 1320 

AAGTACGACA ACGAGTTGCT GGAATTTTGG GGTATTGACA AGAACCTGAT TCACATGCCC 13 80 

GAAATTGTGT CCTCATCTCA ATACTACGGT GACTTTGGCA TTCCTGATTG GATAATGGAA 144 0 

AAGCTACACG ATTCGCCAAA AACAGTACTG CGAGATCTAG TCAAGAGAAA CCTGCCCATA 1500 

CAGGGCTGTC TGGGCGACCA AAGCGCATCC ATGGTGGGGC AACTCGCTTA CAAACCCGGT 1560 

GCTGCAAAAT GTACTTATGG TACCGGTTGC TTTTTACTGT ACAATACGGG GACCAAAAAA 1620 

TTGATCTCCC AACATGGCGC ACTGACGACT CTAGCATTTT GGTTCCCACA TTTGCAAGAG 1680 

TACGGTGGCC AAAAACCAGA ATTGAGCAAG CCACATTTTG CATTAGAGGG TTCCGTCGCT 174 0 

GTGGCTGGTG CTGTGGTCCA ATGGCTACGT GATAATTTAC GATTGATCGA TAAATCAGAG 18 00 

GATGTCGGAC CGATTGCATC TACGGTTCCT GATTCTGGTG GCGTAGTTTT CGTCCCCGCA 1860 

TTTAGTGGCC TATTCGCTCC CTATTGGGAC CCAGATGCCA GAGCCACCAT AATGGGGATG 192 0 

TCTCAATTCA CTACTGCCTC CCACATCGCC AGAGCTGCCG TGGAAGGTGT TTGCTTTCAA 198 0 

GCCAGGGCTA TCTTGAAGGC AATGAGTTCT GACGCGTTTG GTGAAGGTTC CAAAGACAGG 204 0 

GACTTTTTAG AGGAAATTTC CGACGTCACA TATGAAAAGT CGCCCCTGTC GGTTCTGGCA 2100 

GTGGATGGCG GGATGTCGAG GTCTAATGAA GTCATGCAAA TTCAAGCCGA TATCCTAGGT 2160 

CCCTGTGTCA AAGTCAGAAG GTCTCCGACA GCGGAATGTA CCGCATTGGG GGCAGCCATT 222 0 

GCAGCCAATA TGGCTTTCAA GGATGTGAAC GAGCGCCCAT TATGGAAGGA CCTACACGAT 22 80 

GTTAAGAAAT GGGTCTTTTA CAATGGAATG GAGAAAAACG AACAAATATC ACCAGAGGCT 2340 

CATCCAAACC TTAAGATATT CAGAAGTGAA TCCGACGATG CTGAAAGGAG AAAGCATTGG 24 0 0 

AAGTATTGGG AAGTTGCCGT GGAAAGATCC AAAGGTTGGC TGAAGGACAT AGAAGGTGAA 2460 

CACGAACAGG TTCTAGAAAA CTTCCAATAA CAACATAAAT AATTTCTATT AACAATGTAA 252 0 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 91 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD1 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



Met Ser 
1 

Ala Gly 
Lys Pro 



Ala Ala 



Arg Lys 
20 

Phe Lys 
35 



Ala Asp 
5 

Arg Ser 



Val Thr 



Arg Leu Asn 



Ser Ser Ser 
25 

Val He Gly 
40 



Leu Thr Ser 
10 

Val Ser Leu 



Ser Gly Asn 



Gly His Leu Asn 
15 

Lys Ala Ala Glu 
30 

Trp Gly Thr Thr 
45 



He Ala 
50 

Ala Pro 
65 



Lys Val Val Ala 
He Val 



Lys Leu Thr Glu 
Pro Gly 
Asp Ser 



He Thr 
100 



Gin Met 
70 

He He 
85 

Leu Pro 



Val Lys 
115 



Phe Leu 
130 

Val Arg 
145 



Asp Val 
Pro Arg He Cys 



Ala He 



Val Gin Leu Leu 



Gly Ala 
Trp Ser 



Glu Gly 
210 

Pro Tyr 
225 



Leu Ser 
180 

Glu Thr 
195 



Ser Cys 
150 

Ser Ser 
165 

Gly Ala 



Thr Val 



Lys Asp Val Asp 

Phe His Val Ser 
230 



Glu Asn Cys 
55 

Trp Val Phe 

Asn Thr Arg 

Asp Asn Leu 
105 

Asp He He 
120 

Ser Gin Leu 
135 

Leu Lys Gly 
Tyr He Thr 



Asn He Ala 
185 

Ala Tyr His 
200 

His Lys Val 
215 

Val He Glu 



Lys Gly Tyr 
60 

Glu Glu Glu 
75 

His Gin Asn 
90 

Val Ala Asn 



Val Phe Asn 



Lys Gly His 
140 

Phe Glu Val 
155 

Glu Glu Leu 
170 

Thr Glu Val 



He Pro Lys 



Leu Lys Ala 
220 

Asp Val Ala 
235 



Pro Glu Val Phe 



He Asn Gly Glu 
80 

Val Lys Tyr Leu 
95 

Pro Asp Leu He 
110 

He Pro His Gin 
125 

Val Asp Ser His 



Gly Ala Lys Gly 
160 

Gly He Gin Cys 
175 

Ala Gin Glu His 
190 

Asp Phe Arg Gly 
205 

Leu Phe His Arg 



Gly He Ser He 
240 



Cys Gly Ala Leu 
Gly Leu 
Leu Gly 



Gly Trp 
260 



Glu Glu 
290 



Glu He 
275 

Thr Tyr 



Lys Asn 
245 

Gly Asn 
He Arg 
Tyr Gin 



Val Val Ala 



Asn Ala Ser 
265 

Phe Gly Gin 
280 

Glu Ser Ala 
295 



Leu Gly Cys 
250 

Ala Ala He 



Met Phe Phe 



Gly Val Ala 
300 



Gly Phe Val Glu 
255 

Gin Arg Val Gly 
270 

Pro Glu Ser Arg 
285 

Asp Leu He Thr 



Thr Cys Ala Gly Gly Arg Asn Val Lys Val Ala Arg Leu Met Ala Thr 
305 310 315 320 

Ser Gly Lys Asp Ala Trp Glu Cys Glu Lys Glu Leu Leu Asn Gly Gin 
325 330 335 

Ser Ala Gin Gly Leu lie Thr Cys Lys Glu Val His Glu Trp Leu Glu 
340 345 350 

Thr Cys Gly Ser Val Glu Asp Phe Pro Leu Phe Glu Ala Val Tyr Gin 
355 360 365 

lie Val Tyr Asn Asn Tyr Pro Met Lys Asn Leu Pro Asp Met lie Glu 
370 375 380 

Glu Leu Asp Leu His Glu Asp 
385 390 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 384 "amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Thr Ala His Thr Asn lie Lys Gin His Lys His Cys His Glu Asp 
15 10 15 

His Pro lie Arg Arg Ser Asp Ser Ala Val Ser lie Val His Leu Lys 
20 25 30 

Arg Ala Pro Phe Lys Val Thr Val lie Gly Ser Gly Asn Trp Gly Thr 
35 40 45 

Thr lie Ala Lys Val lie Ala Glu Asn Thr Glu Leu His Ser His lie 
50 55 60 

Phe Glu Pro Glu Val Arg Met Trp Val Phe Asp Glu Lys lie Gly Asp 
65 70 75 80 

Glu Asn Leu Thr Asp lie lie Asn Thr Arg His Gin Asn Val Lys Tyr 
85 90 95 

Leu Pro Asn lie Asp Leu Pro His Asn Leu Val Ala Asp Pro Asp Leu 
100 105 110 



Leu His Ser lie Lys Gly Ala Asp lie Leu Val Phe Asn lie Pro His 
115 120 125 



Gin Phe Leu Pro Asn lie Val Lys Gin Leu Gin Gly His Val Ala Pro 
130 135 140 



His Val Arg Ala lie Ser Cys Leu Lys Gly Phe Glu Leu Gly Ser Lys 
145 150 155 160 

Gly Val Gin Leu Leu Ser Ser Tyr Val Thr Asp Glu Leu Gly lie Gin 
165 170 175 



Cys Gly Ala Leu 
180 

His Trp Ser Glu 
195 

Gly Asp Gly Lys 
210 



Ser Gly Ala Asn Leu Ala Pro Glu Val Ala Lys Glu 
185 190 

Thr Thr Val Ala Tyr Gin Leu Pro Lys Asp Tyr Gin 
200 205 

Asp Val Asp His Lys lie Leu Lys Leu Leu Phe His 
215 220 



Arg Pro Tyr Phe 
225 

He Ala Gly Ala 



His Val Asn Val He Asp Asp Val Ala Gly He Ser 
230 235 240 

Leu Lys Asn Val Val Ala Leu Ala Cys Gly Phe Val 

245 250 255 



Glu Gly Met Gly 
260 

Gly Leu Gly Glu 
275 

Lys Val Glu Thr 
290 



Trp Gly Asn 
He He Lys 



Asn Ala Ser Ala Ala lie Gin Arg Leu 
265 270 

Phe Gly Arg Met Phe Phe Pro Glu Ser 
280 285 



Tyr Tyr Gin Glu Ser Ala Gly Val Ala Asp Leu He 
295 300 



Thr Thr Cys Ser 
305 

Lys Thr Gly Lys 



Gly Gly Arg Asn Val Lys Val Ala Thr Tyr Met Ala 
310 315 320 

Ser Ala Leu Glu Ala Glu Lys Glu Leu Leu Asn Gly 
325 330 335 



Gin Ser Ala Gin 
340 

Gin Thr Cys Glu 
355 

Pro Asp Ser Leu 
370 



Gly He He Thr Cys Arg Glu Val His Glu Trp Leu 
345 350 



Leu Thr Gin Glu Phe Pro He He Arg 
360 365 



Gin Gin Arg 
375 



Pro His Gly Arg Pro Thr 
380 



Gly Ser Leu 
Gly Asp Asp 



(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 614 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

Met Thr Arg Ala Thr Trp Cys Asn Ser Pro Pro Pro Leu His Arg Gin 
15 10 15 

Val Ser Arg Arg Asp Leu Leu Asp Arg Leu Asp Lys Thr His Gin Phe 
20 25 30 

Asp Val Leu lie lie Gly Gly Gly Ala Thr Gly Thr Gly Cys Ala Leu 
35 40 45 

Asp Ala Ala Thr Arg Gly Leu Asn Val Ala Leu Val Glu Lys Gly Asp 
50 55 60 

Phe Ala Ser Gly Thr Ser Ser Lys Ser Thr Lys Met lie His Gly Gly 
65 70 75 80 

Val Arg Tyr Leu Glu Lys Ala Phe Trp Glu Phe Ser Lys Ala Gin Leu 
85 90 95 

Asp Leu Val lie Glu Ala Leu Asn Glu Arg Lys His Leu lie Asn Thr 
100 105 110 

Ala Pro His Leu Cys Thr Val Leu Pro lie Leu lie Pro lie Tyr Ser 
115 120 125 

Thr Trp Gin Val Pro Tyr lie Tyr Met Gly Cys Lys Phe Tyr Asp Phe 
130 135 140 

Phe Gly Gly Ser Gin Asn Leu Lys Lys Ser Tyr Leu Leu Ser Lys Ser 
145 150 155 160 

Ala Thr Val Glu Lys Ala Pro Met Leu Thr Thr Asp Asn Leu Lys Ala 
165 170 175 

Ser Leu Val Tyr His Asp Gly Ser Phe Asn Asp Ser Arg Leu Asn Ala 
180 185 190 

Thr Leu Ala lie Thr Gly Val Glu Asn Gly Ala Thr Val Leu He Tyr 
195 200 205 

Val Glu Val Gin Lys Leu He Lys Asp Pro Thr Ser Gly Lys Val He 
210 215 220 

Gly Ala Glu Ala Arg Asp Val Glu Thr Asn Glu Leu Val Arg He Asn 
225 230 235 240 

Ala Lys Cys Val Val Asn Ala Thr Gly Pro Tyr Ser Asp Ala He Leu 
245 250 255 

Gin Met Asp Arg Asn Pro Ser Gly Leu Pro Asp Ser Pro Leu Asn Asp 
260 265 270 



Asn Ser Lys lie Lys Ser Thr Phe Asn Gin lie Ser Val Met Asp Pro 
275 280 285 

Lys Met Val He Pro Ser He Gly Val His He Val Leu Pro Ser Phe 
290 295 300 

Tyr Ser Pro Lys Asp Met Gly Leu Leu Asp Val Arg Thr Ser Asp Gly 
305 310 315 320 

Arg Val Met Phe Phe Leu Pro Trp Gin Gly Lys Val Leu Ala Gly Thr 
325 330 335 

Thr Asp He Pro Leu Lys Gin Val Pro Glu Asn Pro Met Pro Thr Glu 
340 345 350 

Ala Asp He Gin Asp He Leu Lys Glu Leu Gin His Tyr He Glu Phe 
355 360 365 

Pro Val Lys Arg Glu Asp Val Leu Ser Ala Trp Ala Gly Val Arg Pro 
370 375 380 

Leu Val Arg Asp Pro Arg Thr He Pro Ala Asp Gly Lys Lys Gly Ser 
385 390 395 400 

Ala Thr Gin Gly Val Val Arg Ser His Phe Leu Phe Thr Ser Asp Asn 
405 410 415 

Gly Leu He Thr He Ala Gly Gly Lys Trp Thr Thr Tyr Arg Gin Met 
420 425 430 

Ala Glu Glu Thr Val Asp Lys Val Val Glu Val Gly Gly Phe His Asn 
435 440 445 

Leu Lys Pro Cys His Thr Arg Asp He Lys Leu Ala Gly Ala Glu Glu 
450 455 460 

Trp Thr Gin Asn Tyr Val Ala Leu Leu Ala Gin Asn Tyr His Leu Ser 
465 470 475 480 

Ser Lys Met Ser Asn Tyr Leu Val Gin Asn Tyr Gly Thr Arg Ser Ser 
485 490 495 

He He Cys Glu Phe Phe Lys Glu Ser Met Glu Asn Lys Leu Pro Leu 
500 505 510 

Ser Leu Ala Asp Lys Glu Asn Asn Val He Tyr Ser Ser Glu Glu Asn 
515 520 525 

Asn Leu Val Asn Phe Asp Thr Phe Arg Tyr Pro Phe Thr He Gly Glu 
530 535 540 

Leu Lys Tyr Ser Met Gin Tyr Glu Tyr Cys Arg Thr Pro Leu Asp Phe 
545 550 555 560 

Leu Leu Arg Arg Thr Arg Phe Ala Phe Leu Asp Ala Lys Glu Ala Leu 
565 570 " * 575 



Asn Ala Val His Ala Thr Val Lys Val Met Gly Asp Glu Phe Asn Trp 
580 585 590 



Ser Glu Lys Lys Arg Gin Trp Glu Leu Glu Lys Thr Val Asn Phe lie 
595 600 605 

Gin Gly Arg Phe Gly Val 
610 

(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 339 amino acids 

(B) TYPE: amino acid 

( C ) S TRANDEDNE S S : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPSA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asn Gin Arg Asn Ala Ser Met Thr Val lie Gly Ala Gly Ser Tyr 
15 10 15 

Gly Thr Ala Leu Ala lie Thr Leu Ala Arg Asn Gly His Glu Val Val 
20 25 30 

Leu Trp Gly His Asp Pro Glu His lie Ala Thr Leu Glu Arg Asp Arg 
35 40 45 

Cys Asn Ala Ala Phe Leu Pro Asp Val Pro Phe Pro Asp Thr Leu His 
50 55 60 

Leu Glu Ser Asp Leu Ala Thr Ala Leu Ala Ala Ser Arg Asn lie Leu 
65 70 75 80 

Val Val Val Pro Ser His Val Phe Gly Glu Val Leu Arg Gin He Lys 
85 90 95 

Pro Leu Met Arg Pro Asp Ala Arg Leu Val Trp Ala Thr Lys Gly Leu 
100 105 110 

Glu Ala Glu Thr Gly Arg Leu Leu Gin Asp Val Ala Arg Glu Ala Leu 
115 120 125 

Gly Asp Gin He Pro Leu Ala Val He Ser Gly Pro Thr Phe Ala Lys 
130 135 140 

Glu Leu Ala Ala Gly Leu Pro Thr Ala He Ser Leu Ala Ser Thr Asp 
145 150 155 160 



Gin Thr Phe Ala Asp Asp Leu Gin Gin Leu Leu His Cys Gly Lys Ser 
165 170 175 



Phe Arg Val Tyr 
180 

Ala Val Lys Asn 
195 

Gly Phe Gly Ala 
210 

Glu Met Ser Arg 
225 

Met Gly Met Ala 



Gin Ser Arg Asn 
260 

Val Gin Ser Ala 
275 

Asn Thr Lys Glu 
290 

Pro lie Thr Glu 
305 

Arg Glu Ala Ala 



Ser Ser His 



Ser Asn Pro Asp 



Val He Ala He 
200 

Asn Ala Arg Thr 
215 

Leu Gly Ala Ala 
230 

Gly Leu Gly Asp 
245 

Arg Arg Phe Gly 



Gin Glu Lys He 
280 

Val Arg Glu Leu 
295 

Glu He Tyr Gin 
310 

Leu Thr Leu Leu 
325 



Phe He Gly Val 
185 

Gly Ala Gly Met 



Ala Leu He Thr 
220 

Leu Gly Ala Asp 
235 

Leu Val Leu Thr 
250 

Met Met Leu Gly 
265 

Gly Gin Val Val 



Ala His Arg Phe 
300 

Val Leu Tyr Cys 
315 

Gly Arg Ala Arg 
330 



Gin Leu Gly Gly 
190 

Ser Asp Gly He 
205 

Arg Gly Leu Ala 



Pro Ala Thr Phe 
240 

Cys Thr Asp Asn 
255 

Gin Gly Met Asp 
270 

Glu Gly Tyr Arg 
285 

Gly Val Glu Met 



Gly Lys Asn Ala 
320 

Lys Asp Glu Arg 
335 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 501 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GLPD 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Glu Thr Lys Asp Leu He Val He Gly Gly Gly He Asn Gly Ala 
15 10 15 

Gly He Ala Ala Asp Ala Ala Gly Arg Gly Leu Ser Val Leu Met Leu 
20 25 30 



Glu Ala Gin Asp Leu Ala Cys Ala Thr Ser Ser Ala Ser Ser Lys Leu 
35 40 45 



lie His Gly Gly Leu Arg Tyr Leu Glu His Tyr Glu Phe Arg Leu Val 
50 55 60 

Ser Glu Ala Leu Ala Glu Arg Glu Val Leu Leu Lys Met Ala Pro His 
65 70 75 80 

lie Ala Phe Pro Met Arg Phe Arg Leu Pro His Arg Pro His Leu Arg 
85 90 95 

Pro Ala Trp Met lie Arg lie Gly Leu Phe Met Tyr Asp His Leu Gly 
100 v 105 110 

Lys Arg Thr Ser Leu Pro Gly Ser Thr Gly Leu Arg Phe Gly Ala Asn 
115 120 125 

Ser Val Leu Lys Pro Glu lie Lys Arg Gly Phe Glu Tyr Ser Asp Cys 
130 135 140 

Trp Val Asp Asp Ala Arg Leu Val Leu Ala Asn Ala Gin Met Val Val 
145 150 155 160 

Arg Lys Gly Gly Glu Val Leu Thr Arg Thr Arg Ala Thr Ser Ala Arg 
165 170 175 

Arg Glu Asn Gly Leu Trp lie Val Glu Ala Glu Asp lie Asp Thr Gly 
180 185 190 

Lys Lys Tyr Ser Trp Gin Ala Arg Gly Leu Val Asn Ala Thr Gly Pro 
195 200 205 

Trp Val Lys Gin Phe Phe Asp Asp Gly Met His Leu Pro Ser Pro Tyr 
210 215 220 

Gly He Arg Leu He Lys Gly Ser His He Val Val Pro Arg Val His 
225 230 235 240 

Thr Gin Lys Gin Ala Tyr He Leu Gin Asn Glu Asp Lys Arg He Val 
245 250 255 

Phe Val He Pro Trp Met Asp Glu Phe Ser He He Gly Thr Thr Asp 
260 265 270 

Val Glu Tyr Lys Gly Asp Pro Lys Ala Val Lys He Glu Glu Ser Glu 
275 280 285 

He Asn Tyr Leu Leu Asn Val Tyr Asn Thr His Phe Lys Lys Gin Leu 
290 295 300 

Ser Arg Asp Asp He Val Trp Thr Tyr Ser Gly Val Arg Pro Leu Cys 
305 310 315 320 



Asp Asp Glu Ser Asp Ser Pro Gin Ala He Thr Arg Asp Tyr Thr Leu 
325 330 335 



Asp He His Asp Glu Asn Gly Lys Ala Pro Leu Leu Ser Val Phe Gly 
340 345 350 



Gly Lys Leu Thr Thr Tyr Arg Lys Leu Ala Glu His Ala Leu Glu Lys 
355 360 365 



Leu Thr Pro Tyr Tyr Gin 
370 

Val Leu Pro Gly Gly Ala 
385 390 

Arg Leu Arg Arg Arg Tyr 
405 

Tyr Ala Arg Thr Tyr Gly 
420 

Gly Thr Val Ser Asp Leu 
435 

Ala Glu Leu Lys Tyr Leu 
450 

Asp Ala Leu Trp Arg Arg 
465 470 

Gin Gin Ser Arg Val Ser 
485 

Leu Ser Leu Ala Ser 
500 

(2) INFORMATION FOR SEQ ID NO : 16 : 

(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 542 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GLPABC 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Lys Thr Arg Asp Ser Gin Ser Ser Asp Val lie lie lie Gly Gly 
15 10 15 

Gly Ala Thr Gly Ala Gly lie Ala Arg Asp Cys Ala Leu Arg Gly Leu 
20 25 30 

Arg Val He Leu Val Glu Arg His Asp He Ala Thr Gly Ala Thr Gly 
35 40 45 



Gly He Gly Pro Ala 
375 

He Glu Gly Asp Arg 
395 

Pro Phe Leu Thr Glu 
410 

Ser Asn Ser Glu Leu 
425 

Gly Glu Asp Phe Gly 
440 

Val Asp His Glu Trp 
455 

Thr Lys Gin Gly Met 
475 

Gin Trp Leu Val Glu 
490 



Trp Thr Lys Glu Ser 
380 

Asp Asp Tyr Ala Ala 
400 

Ser Leu Ala Arg His 
415 

Leu Leu Gly Asn Ala 
430 

His Glu Phe Tyr Glu 
445 

Val Arg Arg Ala Asp 
460 

Trp Leu Asn Ala Asp 
480 

Tyr Thr Gin Gin Arg 
495 



Arg Asn His Gly Leu Leu His Ser Gly Ala Arg Tyr Ala Val Thr Asp 
50 55 60 



Ala Glu Ser Ala Arg Glu Cys lie Ser Glu Asn Gin lie Leu Lys Arg 
65 70 75 80 



lie Ala Arg His Cys Val Glu Pro Thr Asn Gly Leu Phe lie Thr Leu 
85 90 95 

Pro Glu Asp Asp Leu Ser Phe Gin Ala Thr Phe lie Arg Ala Cys Glu 
100 105 110 

Glu Ala Gly lie Ser Ala Glu Ala lie Asp Pro Gin Gin Ala Arg lie 
115 120 125 

He Glu Pro Ala Val Asn Pro Ala Leu He Gly Ala Val Lys Val Pro 
130 135 140 

Asp Gly Thr Val Asp Pro Phe Arg Leu Thr Ala Ala Asn Met Leu Asp 
145 ^ 150 155 160 

Ala Lys Glu His Gly Ala Val He Leu Thr Ala His Glu Val Thr Gly 
165 170 175 

Leu He Arg Glu Gly Ala Thr Val Cys Gly Val Arg Val Arg Asn His 
180 185 190 

Leu Thr Gly Glu Thr Gin Ala Leu His Ala Pro Val Val Val Asn Ala 
195 200 205 

Ala Gly He Trp Gly Gin His He Ala Glu Tyr Ala Asp Leu Arg He 
210 215 220 

Arg Met Phe Pro Ala Lys Gly Ser Leu Leu He Met Asp His Arg He 
225 230 235 240 

Asn Gin His Val He Asn Arg Cys Arg Lys Pro Ser Asp Ala Asp He 
245 250 255 

Leu Val Pro Gly Asp Thr He Ser Leu He Gly Thr Thr Ser Leu Arg 
260 265 270 

He Asp Tyr Asn Glu He Asp Asp Asn Arg Val Thr Ala Glu Glu Val 
275 280 285 

Asp He Leu Leu Arg Glu Gly Glu Lys Leu Ala Pro Val Met Ala Lys 
290 295 300 

Thr Arg He Leu Arg Ala Tyr Ser Gly Val Arg Pro Leu Val Ala Ser 
305 310 315 320 

Asp Asp Asp Pro Ser Gly Arg Asn Leu Ser Arg Gly He Val Leu Leu 
325 330 335 

Asp His Ala Glu Arg Asp Gly Leu Asp Gly Phe He Thr He Thr Gly 
340 345 350 



Gly Lys Leu Met Thr Tyr Arg Leu Met Ala Glu Trp Ala Thr Asp Ala 
355 360 365 



Val Cys Arg Lys Leu Gly Asn Thr Arg Pro Cys Thr Thr Ala Asp Leu 
370 375 380 



Ala Leu Pro Gly Ser Gin Glu Pro Ala Glu Val Thr Leu Arg Lys Val 
385 390 395 400 

lie Ser Leu Pro Ala Pro Leu Arg Gly Ser Ala Val Tyr Arg His Gly 
405 410 415 

Asp Arg Thr Pro Ala Trp Leu Ser Glu Gly Arg Leu His Arg Ser Leu 
420 425 430 

Val Cys Glu Cys Glu Ala Val Thr Ala Gly Glu Val Gin Tyr Ala Val 
435 440 445 

Glu Asn Leu Asn Val Asn Ser Leu Leu Asp Leu Arg Arg Arg Thr Arg 
450 455 460 

Val Gly Met Gly Thr Cys Gin Gly Glu Leu Cys Ala Cys Arg Ala Ala 
465 470 475 480 

Gly Leu Leu Gin Arg Phe Asn Val Thr Thr Ser Ala Gin Ser lie Glu 
485 490 495 

Gin Leu Ser Thr Phe Leu Asn Glu Arg Trp Lys Gly Val Gin Pro lie 
500 505 510 

Ala Trp Gly Asp Ala Leu Arg Glu Ser Glu Phe Thr Arg Trp Val Tyr 
515 520 525 

Gin Gly Leu Cys Gly Leu Glu Lys Glu Gin Lys Asp Ala Leu 
530 535 540 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 amino acids 

(B) TYPE: amino acid 

( C ) STRAND EDNES S : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Gly Leu Thr Thr Lys Pro Leu Ser Leu Lys Val Asn Ala Ala Leu 
15 10 15 

Phe Asp Val Asp Gly Thr He He He Ser Gin Pro Ala He Ala Ala 
20 25 30 



Phe Trp Arg Asp Phe Gly Lys Asp Lys Pro Tyr Phe Asp Ala Glu His 
35 40 45 



Val lie Gin Val Ser His Gly Trp Arg Thr Phe Asp Ala He Ala Lys 
50 55 60 



Phe Ala Pro Asp Phe Ala Asn Glu Glu Tyr Val Asn Lys Leu Glu Ala 
65 70 75 80 

Glu He Pro Val Lys Tyr Gly Glu Lys Ser He Glu Val Pro Gly Ala 
85 90 95 

Val Lys Leu Cys Asn Ala Leu Asn Ala Leu Pro Lys Glu Lys Trp Ala 
100 105 110 

Val Ala Thr Ser Gly Thr Arg Asp Met Ala Gin Lys Trp Phe Glu His 
115 120 125 

Leu Gly He Arg Arg Pro Lys Tyr Phe He Thr Ala Asn Asp Val Lys 
130 135 140 

Gin Gly Lys Pro His Pro Glu Pro Tyr Leu Lys Gly Arg Asn Gly Leu 
145 150 155 160 

Gly Tyr Pro He Asn Glu Gin Asp Pro Ser Lys Ser Lys Val Val Val 
165 170 175 

Phe Glu Asp Ala Pro Ala Gly He Ala Ala Gly Lys Ala Ala Gly Cys 
180 185 190 

Lys He He Gly He Ala Thr Thr Phe Asp Leu Asp Phe Leu Lys Glu 
195 200 205 

Lys Gly Cys Asp He He Val Lys Asn His Glu Ser He Arg Val Gly 
210 215 220 

Gly Tyr Asn Ala Glu Thr Asp Glu Val Glu Phe He Phe Asp Asp Tyr 
225 230 235 240 

Leu Tyr Ala Lys Asp Asp Leu Leu Lys Trp 
245 250 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 709 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



Met Phe Pro Ser Leu Phe Arg Leu Val Val Phe Ser Lys Arg Tyr He 
15 10 15 



Phe Arg Ser Ser Gin Arg Leu Tyr Thr Ser Leu Lys Gin Glu Gin Ser 
20 25 30 



Arg Met Ser Lys lie Met Glu Asp Leu Arg Ser Asp Tyr Val Pro Leu 
35 40 45 

lie Ala Ser lie Asp Val Gly Thr Thr Ser Ser Arg Cys lie Leu Phe 
50 55 60 

Asn Arg Trp Gly Gin Asp Val Ser Lys His Gin lie Glu Tyr Ser Thr 
65 70 75 80 

Ser Ala Ser Lys Gly Lys lie Gly Val Ser Gly Leu Arg Arg Pro Ser 
85 90 95 

Thr Ala Pro Ala Arg Glu Thr Pro Asn Ala Gly Asp lie Lys Thr Ser 
100 105 110 

Gly Lys Pro lie Phe Ser Ala Glu Gly Tyr Ala lie Gin Glu Thr Lys 
115 120 125 

Phe Leu Lys lie Glu Glu Leu Asp Leu Asp Phe His Asn Glu Pro Thr 
130 135 140 

Leu Lys Phe Pro Lys Pro Gly Trp Val Glu Cys His Pro Gin Lys Leu 
145 150 155 160 

Leu Val Asn Val Val Gin Cys Leu Ala Ser Ser Leu Leu Ser Leu Gin 
165 170 175 

Thr lie Asn Ser Glu Arg Val Ala Asn Gly Leu Pro Pro Tyr Lys Val 
180 185 190 

lie Cys Met Gly lie Ala Asn Met Arg Glu Thr Thr lie Leu Trp Ser 
195 200 205 

Arg Arg Thr Gly Lys Pro lie Val Asn Tyr Gly lie Val Trp Asn Asp 
210 215 220 

Thr Arg Thr lie Lys lie Val Arg Asp Lys Trp Gin Asn Thr Ser Val 
225 230 235 240 

Asp Arg Gin Leu Gin Leu Arg Gin Lys Thr Gly Leu Pro Leu Leu Ser 
245 250 255 

Thr Tyr Phe Ser Cys Ser Lys Leu Arg Trp Phe Leu Asp Asn Glu Pro 
260 265 270 

Leu Cys Thr Lys Ala Tyr Glu Glu Asn Asp Leu Met Phe Gly Thr Val 
275 280 285 

Asp Thr Trp Leu lie Tyr Gin Leu Thr Lys Gin Lys Ala Phe Val Ser 
290 295 300 



Asp Val Thr Asn Ala Ser Arg Thr Gly Phe Met Asn Leu Ser Thr Leu 
305 310 315 320 



Lys Tyr Asp Asn Glu Leu Leu Glu Phe Trp Gly lie Asp Lys Asn Leu 
325 330 335 



He His Met 



Pro Glu 
340 



He Val Ser 



Ser Ser Gin Tyr Tyr 
345 



Gly Asp Phe 
350 



Gly He Pro 
355 

Val Leu Arg 
370 



Asp Trp 
Asp Leu 



He Met Glu 
360 

Val Lys Arg 
375 



Lys Leu His Asp Ser 
365 

Asn Leu Pro He Gin 
380 



Pro Lys Thr 
Gly Cys Leu 



Gly Asp Gin 
385 



Ser Ala 



Ser Met Val 
390 



Gly Gin Leu Ala Tyr 
395 



Lys Pro Gly 
400 



Ala Ala Lys 
Gly Thr Lys 



Cys Thr 
405 

Lys Leu 
420 



Tyr Gly Thr 



He Ser Gin 



Gly Cys Phe Leu Leu 
410 

His Gly Ala Leu Thr 
425 



Tyr Asn Thr 
415 

Thr Leu Ala 
430 



Phe Trp Phe 
435 



Pro His 



Leu Gin Glu 
440 



Tyr Gly Gly Gin Lys 
445 



Pro Glu Leu 



Ser Lys Pro 
450 



His Phe 



Ala Leu Glu 
455 



Gly Ser Val Ala Val 
460 



Ala Gly Ala 



Val Val Gin 
465 



Trp Leu 



Arg Asp Asn 
470 



Leu Arg Leu He Asp 
475 



Lys Ser Glu 
480 



Asp Val Gly 



Pro He 
485 



Ala Ser Thr 



Val Pro Asp Ser Gly 
490 



Gly Val Val 
495 



Phe Val Pro 



Ala Phe 
500 



Ser Gly Leu 



Phe Ala Pro Tyr Trp 
505 



Asp Pro Asp 
510 



Ala Arg Ala 
515 



Thr He 



Met Gly Met 
520 



Ser Gin Phe Thr Thr 
525 



Ala Ser His 



He Ala Arg 
530 



Ala Ala 



Val Glu Gly 
535 



Val Cys Phe Gin Ala 
540 



Arg Ala He 



Leu Lys Ala 
545 



Met Ser 



Ser Asp Ala 
550 



Phe Gly Glu Gly Ser 
555 



Lys Asp Arg 
560 



Asp Phe Leu 



Ser Val Leu 



Glu Glu 
565 

Ala Val 
580 



He Ser Asp 
Asp Gly Gly 



Val Thr Tyr Glu Lys 
570 

Met Ser Arg Ser Asn 
585 



Ser Pro Leu 
575 

Glu Val Met 
590 



Gin He Gin 
595 



Ala Asp 



He Leu Gly 
600 



Pro Cys Val Lys Val 
605 



Arg Arg Ser 



Pro Thr Ala Glu Cys Thr Ala Leu Gly Ala Ala He Ala Ala Asn Met 
610 615 620 



Ala Phe Lys Asp Val Asn Glu Arg Pro Leu Trp Lys Asp Leu His Asp 
625 630 635 640 

Val Lys Lys Trp Val Phe Tyr Asn Gly Met Glu Lys Asn Glu Gin lie 
645 650 655 

Ser Pro Glu Ala His Pro Asn Leu Lys lie Phe Arg Ser Glu Ser Asp 
660 665 670 

Asp Ala Glu Arg Arg Lys His Trp Lys Tyr Trp Glu Val Ala Val Glu 
675 680 685 

Arg Ser Lys Gly Trp Leu Lys Asp lie Glu Gly Glu His Glu Gin Val 
690 695 700 

Leu Glu Asn Phe Gin 
705 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12145 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: PHK28-26 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GTCGACCACC ACGGTGGTGA CTTTAATGCC GCTCTCATGC AGCAGCTCGG TGGCGGTCTC 60 

AAAATTCAGG ATGTCGCCGG TATAGTTTTT GATAATCAGC AAGACGCCTT CGCCGCCGTC 12 0 

AATTTGCATC GCGCATTCAA ACATTTTGTC CGGCGTCGGC GAGGTGAATA TTTCCCCCGG 180 

ACAGGCGCCG GAGAGCATGC CCTGGCCGAT ATAGCCGCAG TGCATCGGTT CATGTCCGCT 240 

GCCGCCGCCG GAGAGCAGGG CCACCTTGCC AGCCACCGGC GCGTCGGTGC GGGTCACATA 300 

CAGCGGGTCC TGATGCAGGG TCAGCTGCGG ATGGGCTTTA GCCAGCCCCT GTAATTGTTC 360 

ATTCAGTACA TCTTCAACAC GGTTAATCAG CTTTTTCATT ATTCAGTGCT CCGTTGGAGA 42 0 

AGGTTCGATG CCGCCTCTCT GCTGGCGGAG GCGGTCATCG CGTAGGGGTA TCGTCTGACG 4 80 

GTGGAGCGTG CCTGGCGATA TGATGATTCT GGCTGAGCGG ACGAAAAAAA GAATGCCCCG 54 0 

ACGATCGGGT TTCATTACGA AACATTGCTT CCTGATTTTG TTTCTTTATG GAACGTTTTT 600 

GCTGAGGATA TGGTGAAAAT GCGAGCTGGC GCGCTTTTTT TCTTCTGCCA TAAGCGGCGG 660 

TCAGGATAGC CGGCGAAGCG GGTGGGAAAA AATTTTTTGC TGATTTTCTG CCGACTGCGG 72 0 



GAGAAAAGGC GGTCAAACAC GGAGGATTGT AAGGGCATTA TGCGGCAAAG GAGCGGATCG 7 80 

GGATCGCAAT CCTGACAGAG ACTAGGGTTT TTTGTTCCAA TATGGAACGT AAAAAATTAA 840 

CCTGTGTTTC ATATCAGAAC AAAAAGGCGA AAGATTTTTT TGTTCCCTGC CGGCCCTACA 900 

GTGATCGCAC TGCTCCGGTA CGCTCCGTTC AGGCCGCGCT TCACTGGCCG GCGCGGATAA 960 

CGCCAGGGCT CATCATGTCT ACATGCGCAC TTATTTGAGG GTGAAAGGAA TGCTAAAAGT 1020 

TATTCAATCT CCAGCCAAAT ATCTTCAGGG TCCTGATGCT GCTGTTCTGT TCGGTCAATA 1080 

TGCCAAAAAC CTGGCGGAGA GCTTCTTCGT CATCGCTGAC GATTTCGTAA TGAAGCTGGC 1140 

GGGAGAGAAA GTGGTGAATG GCCTGCAGAG CCACGATATT CGCTGCCATG CGGAACGGTT 12 00 

TAACGGCGAA TGCAGCCATG CGGAAATCAA CCGTCTGATG GCGATTTTGC AAAAACAGGG 12 60 

CTGCCGCGGC GTGGTCGGGA TCGGCGGTGG TAAAACCCTC GATACCGCGA AGGCGATCGG 1320 

TTACTACCAG AAGCTGCCGG TGGTGGTGAT CCCGACCATC GCCTCGACCG ATGCGCCAAC 13 80 

CAGCGCGCTG TCGGTGATCT ACACCGAAGC GGGCGAGTTT GAAGAGTATC TGATCTATCC 1440 

GAAAAACCCG GATATGGTGG TGATGGACAC GGCGATTATC GCCAAAGCGC CGGTACGCCT 15 00 

GCTGGTCTCC GGCATGGGCG ATGCGCTCTC CACCTGGTTC GAGGCCAAAG CTTGCTACGA 1560 

TGCGCGCGCC ACCAGCATGG CCGGAGGACA GTCCACCGAG GCGGCGCTGA GCCTCGCCCG 1620 

CCTGTGCTAT GATACGCTGC TGGCGGAGGG CGAAAAGGCC CGTCTGGCGG CGCAGGCCGG 1680 

GGTAGTGACC GAAGCGCTGG AGCGCATCAT CGAGGCGAAC ACTTACCTCA GCGGCATTGG 1740 

CTTTGAAAGC AGTGGCCTGG CCGCTGCCCA TGCAATCCAC AACGGTTTCA CCATTCTTGA 18 00 

AGAGTGCCAT CACCTGTATC ACGGTGAGAA AGTGGCCTTC GGTACCCTGG CGCAGCTGGT 18 60 

GCTGCAGAAC AGCCCGATGG ACGAGATTGA AACGGTGCAG GGCTTCTGCC AGCGCGTCGG 192 0 

CCTGCCGGTG ACGCTCGCGC AGATGGGCGT CAAAGAGGGG ATCGACGAGA AAATCGCCGC 1980 

GGTGGCGAAA GCTACCTGCG CGGAAGGGGA AACCATCCAT AATATGCCGT TTGCGGTGAC 2 040 

CCCGGAGAGC GTCCATGCCG CTATCCTCAC CGCCGATCTG TTAGGCCAGC AGTGGCTGGC 2100 

GCGTTAATTC GCGGTGGCTA AACCGCTGGC CCAGGTCAGC GGTTTTTCTT TCTCCCCTCC 2160 

GGCAGTCGCT GCCGGAGGGG TTCTCTATGG TACAACGCGG AAAAGGATAT GACTGTTCAG 2220 

ACTCAGGATA CCGGGAAGGC GGTCTCTTCC GTCATTGCCC AGTCATGGCA CCGCTGCAGC 22 80 

AAGTTTATGC AGCGCGAAAC CTGGCAAACG CCGCACCAGG CCCAGGGCCT GACCTTCGAC 2 340 

TCCATCTGTC GGCGTAAAAC CGCGCTGCTC ACCATCGGCC AGGCGGCGCT GGAAGACGCC 2400 



TGGGAGTTTA TGGACGGCCG CCCCTGCGCG CTGTTTATTC TTGATGAGTC CGCCTGCATC 2460 

CTGAGCCGTT GCGGCGAGCC GCAAACCCTG GCCCAGCTGG CTGCCCTGGG ATTTCGCGAC 2 52 0 

GGCAGCTATT GTGCGGAGAG CATTATCGGC ACCTGCGCGC TGTCGCTGGC CGCGATGCAG 2 580 

GGCCAGCCGA TCAACACCGC CGGCGATCGG CATTTTAAGC AGGCGCTACA GCCATGGAGT 2 64 0 

TTTTGCTCGA CGCCGGTGTT TGATAACCAC GGGCGGCTGT TCGGCTCTAT CTCGCTTTGC 2 700 

TGTCTGGTCG AGCACCAGTC CAGCGCCGAC CTCTCCCTGA CGCTGGCCAT CGCCCGCGAG 2 760 

GTGGGTAACT CCCTGCTTAC CGACAGCCTG CTGGCGGAAT CCAACCGTCA CCTCAATCAG 2 82 0 

ATGTACGGCC TGCTGGAGAG CATGGACGAT GGGGTGATGG CGTGGAACGA ACAGGGCGTG 2 88 0 

CTGCAGTTTC TCAATGTTCA GGCGGCGAGA CTGCTGCATC TTGATGCTCA GGCCAGCCAG 2 94 0 

GGGAAAAATA TCGCCGATCT GGTGACCCTC CCGGCGCTGC TGCGCCGCGC CATCAAACAC 3 00 0 

GCCCGCGGCC TGAATCACGT CGAAGTCACC TTTGAAAGTC AGCATCAGTT TGTCGATGCG 3 060 

GTGATCACCT TAAAACCGAT TGTCGAGGCG CAAGGCAACA GTTTTATTCT GCTGCTGCAT 3120 

CCGGTGGAGC AGATGCGGCA GCTGATGACC AGCCAGCTCG GTAAAGTCAG CCACACCTTT 3180 

GAGCAGATGT CTGCCGACGA TCCGGAAACC CGACGCCTGA TCCACTTTGG CCGCCAGGCG 3240 

GCGCGCGGCG GCTTCCCGGT GCTACTGTGC GGCGAAGAGG GGGTCGGGAA AGAGCTGCTG 33 00 

AGCCAGGCTA TTCACAATGA AAGCGAACGG GCGGGCGGCC CCTACATCTC CGTCAACTGC 3360 

CAGCTATATG CCGACAGCGT GCTGGGCCAG GACTTTATGG GCAGCGCCCC TACCGACGAT 342 0 

GAAAATGGTC GCCTGAGCCG CCTTGAGCTG GCCAACGGCG GCACCCTGTT TCTGGAAAAG 34 80 

ATCGAGTATC TGGCGCCGGA GCTGCAGTCG GCTCTGCTGC AGGTGATTAA GCAGGGCGTG 3 54 0 

CTCACCCGCC TCGACGCCCG GCGCCTGATC CCGGTGGATG TGAAGGTGAT TGCCACCACC 3 600 

ACCGTCGATC TGGCCAATCT GGTGGAACAG AACCGCTTTA GCCGCCAGCT GTACTATGCG 3 660 

CTGCACTCCT TTGAGATCGT CATCCCGCCG CTGCGCGCCC GACGCAACAG TATTCCGTCG 3 72 0 

CTGGTGCATA ACCGGTTGAA GAGCCTGGAG AAGCGTTTCT CTTCGCGACT GAAAGTGGAC 3 780 

GATGACGCGC TGGCACAGCT GGTGGCCTAC TCGTGGCCGG GGAATGATTT TGAGCTCAAC 3 84 0 

AGCGTCATTG AGAATATCGC CATCAGCAGC GACAACGGCC ACATTCGCCT GAGTAATCTG 3 900 

CCGGAATATC TCTTTTCCGA GCGGCCGGGC GGGGATAGCG CGTCATCGCT GCTGCCGGCC 3 960 

AGCCTGACTT TTAGCGCCAT CGAAAAGGAA GCTATTATTC ACGCCGCCCG GGTGACCAGC 4 02 0 

GGGCGGGTGC AGGAGATGTC GCAGCTGCTC AATATCGGCC GCACCACCCT GTGGCGCAAA 4 080 

ATGAAGCAGT ACGATATTGA CGCCAGCCAG TTCAAGCGCA AGCATCAGGC CTAGTCTCTT 414 0 



CGATTCGCGC CATGGAGAAC AGGGCATCCG ACAGGCGATT GCTGTAGCGT TTGAGCGCGT 42 00 

CGCGCAGCGG ATGCGCGCGG TCCATGGCCG TCAGCAGGCG TTCGAGCCGA CGGGACTGGG 4260 

TGCGCGCCAC GTGCAGCTGG GCAGAGGCGA GATTCCTCCC CGGGATCACG AACTGTTTTA 432 0 

ACGGGCCGCT CTCGGCCATA TTGCGGTCGA TAAGCCGCTC CAGGGCGGTG ATCTCCTCTT 43 8 0 

CGCCGATCGT CTGGCTCAGG CGGGTCAGGC CCCGCGCATC GCTGGCCAGT TCAGCCCCCA 444 0 

GCACGAACAG CGTCTGCTGA ATATGGTGCA GGCTTTCCCG CAGCCCGGCG TCGCGGGTCG 4500 

TGGCGTAGCA GACGCCCAGC TGGGATATCA GTTCATCGAC GGTGCCGTAG GCCTCGACGC 4560 

GAATATGGTC TTTCTCGATG CGGCTGCCGC CGTACAGGGC GGTGGTGCCT TTATCCCCGG 462 0 

TGCGGGTATA GATACGATAC ATTCAGTTTC TCTCACTTAA CGGCAGGACT TTAACCAGCT 4680 

GCCCGGCGTT GGCGCCGAGC GTACGCAGTT GATCGTCGCT ATCGGTGACG TGTCCGGTAG 4740 

CCAGCGGCGC GTCCGCCGGC AGCTGGGCAT GAGTGAGGGC TATCTCGCCG GACGCGCTGA 4 800 

GCCCGATACC CACCCGCAGG GGCGAGCTTC TGGCCGCCAG GGCGCCCAGC GCAGCGGCGT 4860 

CACCGCCTCC GTCATAGGTT ATGGTCTGGC AGGGGACCCC CTGCTCCTCC AGCCCCCAGC 4 92 0 

ACAGCTCATT GATGGCGCCG GCATGGTGCC CGCGCGGATC GTAAAACAGG CGTACGCCTG 4 980 

GCGGTGAAAG CGACATGACG GTCCCCTCGT TAACACTCAG AATGCCTGGC GGAAAATCGC 504 0 

GGCAATCTCC TGCTCGTTGC CTTTACGCGG GTTCGAGAAC GCATTGCCGT CTTTTAGAGC 5100 

CATCTCCGCC ATGTAGGGGA AGTCGGCCTC TTTTACCCCC AGATCGCGCA GATGCTGCGG 5160 

AATAC CGATA TCCATCGACA GACGCGTGAT AGCGGCGATG GCTTTTTCCG CCGCGTCGAG 5220 

AGTGGACAGT CCGGTGATAT TTTCGCCCAT CAGTTCAGCG ATATCGGCGA ATTTCTCCGG 52 80 

GTTGGCGATC AGGTTGTAGC GCGCCACATG CGGCAGCAGG ACAGCGTTGG CCACGCCGTG 534 0 

CGGCATGTCG TACAGGCCGC CCAGCTGGTG CGCCATGGCG TGCACGTAGC CGAGGTTGGC 54 00 

GTTATTGAAA GCCATCCCGG CCAGCAGAGA AGCATAGGCC ATGTTTTCCC GCGCCTGCAG 5460 

ATTGCTGCCG AGGGCCACGG CCTGGCGCAG GTTGCGGGCG ATGAGGCGGA TCGCCTGCAT 552 0 

GGCGGCGGCG TCCGTCACCG GGTTAGCGTC TTTGGAGATA TAGGCCTCTA CGGCGTGGGT 5580 

CAGGGCATCC ATCCCGGTCG CCGCGGTCAG GGCGGCCGGT TTACCGATCA TCAGCAGTGG 564 0 

ATCGTTGATA GAGACCGACG GCAGTTTGCG CCAGCTGACG ATCACAAACT TCACTTTGGT 570 0 

TTCGGTGTTG GTCAGGACGC AGTGGCGGGT GACCTCGCTG GCGGTGCCGG CGGTGGTATT 5760 

GACCGCGACG ATAGGCGGCA GCGGGTTGGT CAGGGTCTCG ATTCCGGCAT ACTGGTACAG 582 0 



ATCGCCCTCA TGGGTGGCGG CGATGCCGAT GCCTTTGCCG CAATCGTGCG GGCTGCCGCC 5880 

GCCCACGGTG ACGATGATGT CGCACTGTTC GCGGCGAAAC ACGGCGAGGC CGTCGCGCAC 5940 

GTTGGTGTCT TTCGGGTTCG GCTCGACGCC GTCAAAGATC GCCACCTCGA TCCCGGCCTC 6000 

CCGCAGATAA TGCAGGGTTT TGTCCACCGC GCCATCTTTA ATTGCCCGCA GGCCTTTGTC 6060 

GGTGACCAGC AGGGCTTTTT TCCCCCCCAG CAGCTGGCAG CGTTCGCCGA CTACGGAAAT 612 0 

GGCGTTGGGG CCAAAAAAGT TAACGTTTGG C AC C AG AT AA TCAAACATAC GATAGCTCAT 6180 

AATATACCTT CTCGCTTCAG GTTATAATGC GGAAAAACAA TCCAGGGCGC ACTGGGCTAA 624 0 

TAATTGATCC TGCTCGACCG TACCGCCGCT AACGCCGACG GCGCCAATTA CCTGCTCATT 63 00 

AAAAATAACT GGCAGGCCGC CGCCAAAAAT AATAATTCGC TGTTGGTTGG TTAGCTGCAG 6360 

ACCGTACAGA GATTGTCCTG GCTGGACCGC TGACGTAATT TCATGGGTAC CTTGCTTCAG 642 0 

GCTGCAGGCG CTCCAGGCTT TATTCAGGGA AATATCGCAG CTGGAGACGA AGGCCTCGTC 64 8 0 

CATCCGCTGG ATAAGCAGCG TGTTGCCTCC GCGGTCAACT ACGGAAAACA CCACCGCCAC 654 0 

GTTGATCTCA GTGGCTTTTT TTTCCACCGC CGCCGCCATT TGCTGGGCGG CGGCCAGGGT 6600 

GATTGTCTGA ACTTGTTGGC TCTTGTTCAT CATTCTCTCC CGCACCAGGA TAACGCTGGC 6660 

GCGAATAGTC AGTAGGGGGC GATAGTAAAA AACTATTACC ATTCGGTTGG CTTGCTTTAT 672 0 

TTTTGTCAGC GTTATTTTGT CGCCCGCCAT GATTTAGTCA ATAGGGTTAA AATAGCGTCG 6780 

GAAAAACGTA ATTAAGGGCG TTTTTTATTA ATTGATTTAT ATCATTGCGG GCGATCACAT 684 0 

TTTTTATTTT TGCCGCCGGA GTAAAGTTTC ATAGTGAAAC TGTCGGTAGA TTTCGTGTGC 690 0 

CAAATTGAAA CGAAATTAAA TTTATTTTTT TCACCACTGG CTCATTTAAA GTTCCGCTAT 696 0 

TGCCGGTAAT GGCCGGGCGG CAACGACGCT GGCCCGGCGT ATTCGCTACC GTCTGCGGAT 702 0 

TTCACCTTTT GAGCCGATGA ACAATGAAAA GATCAAAACG ATTTGCAGTA CTGGCCCAGC 7080 

GCCCCGTCAA TCAGGACGGG CTGATTGGCG AGTGGCCTGA AGAGGGGCTG ATCGCCATGG 7140 

ACAGCCCCTT TGACCCGGTC TCTTCAGTAA AAGTGGACAA CGGTCTGATC GTCGAACTGG 7200 

ACGGCAAACG CCGGGACCAG TTTGACATGA TCGACCGATT TATCGCCGAT TACGCGATCA 7260 

ACGTTGAGCG CACAGAGCAG GCAATGCGCC TGGAGGCGGT GGAAATAGCC CGTATGCTGG 732 0 

TGGATATTCA CGTCAGCCGG GAGGAGATCA TTGCCATCAC TACCGCCATC ACGCCGGCCA 73 80 

AAGCGGTCGA GGTGATGGCG CAGATGAACG TGGTGGAGAT GATGATGGCG CTGCAGAAGA 744 0 

TGCGTGCCCG CCGGACCCCC TCCAACCAGT GCCACGTCAC CAATCTCAAA GATAATCCGG 7500 

TGCAGATTGC CGCTGACGCC GCCGAGGCCG GGATCCGCGG CTTCTCAGAA CAGGAGACCA 7560 



CGGTCGGTAT CGCGCGCTAC GCGCCGTTTA ACGCCCTGGC GCTGTTGGTC GGTTCGCAGT 762 0 

GCGGCCGCCC CGGCGTGTTG ACGCAGTGCT CGGTGGAAGA GGCCACCGAG CTGGAGCTGG 768 0 

GCATGCGTGG CTTAACCAGC TACGCCGAGA CGGTGTCGGT CTACGGCACC GAAGCGGTAT 774 0 

TTACCGACGG CGATGATACG CCGTGGTCAA AGGCGTTCCT CGCCTCGGCC TACGCCTCCC 7800 

GCGGGTTGAA AATGCGCTAC ACCTCCGGCA CCGGATCCGA AGCGCTGATG GGCTATTCGG 7860 

AGAGCAAGTC GATGCTCTAC CTCGAATCGC GCTGCATCTT CATTACTAAA GGCGCCGGGG 7 92 0 

TTCAGGGACT GCAAAACGGC GCGGTGAGCT GTATCGGCAT GACCGGCGCT GTGCCGTCGG 7 980 

GCATTCGGGC GGTGCTGGCG GAAAACCTGA TCGCCTCTAT GCTCGACCTC GAAGTGGCGT 804 0 

CCGCCAACGA CCAGACTTTC TCCCACTCGG ATATTCGCCG CACCGCGCGC ACCCTGATGC 8100 

AGATGCTGCC GGGCACCGAC TTTATTTTCT CCGGCTACAG CGCGGTGCCG AACTACGACA 8160 

ACATGTTCGC CGGCTCGAAC TTCGATGCGG AAGATTTTGA TGATTACAAC ATCCTGCAGC 822 0 

GTGACCTGAT GGTTGACGGC GGCCTGCGTC CGGTGACCGA GGCGGAAACC ATTGCCATTC 82 80 

GCCAGAAAGC GGCGCGGGCG ATCCAGGCGG TTTTCCGCGA GCTGGGGCTG CCGCCAATCG 834 0 

CCGACGAGGA GGTGGAGGCC GCCACCTACG CGCACGGCAG CAACGAGATG CCGCCGCGTA 84 00 

ACGTGGTGGA GGATCTGAGT GCGGTGGAAG AGATGATGAA GCGCAACATC ACCGGCCTCG 84 60 

ATATTGTCGG CGCGCTGAGC CGCAGCGGCT TTGAGGATAT CGCCAGCAAT ATTCTCAATA 852 0 

TGCTGCGCCA GCGGGTCACC GGCGATTACC TGCAGACCTC GGCCATTCTC GATCGGCAGT 85 80 

TCGAGGTGGT GAGTGCGGTC AACGACATCA ATGACTATCA GGGGCCGGGC ACCGGCTATC 8640 

GCATCTCTGC CGAACGCTGG GCGGAGATCA AAAATATTCC GGGCGTGGTT CAGCCCGACA 8700 

CCATTGAATA AGGCGGTATT CCTGTGCAAC AGACAACCCA AATTCAGCCC TCTTTTACCC 8760 

TGAAAACCCG CGAGGGCGGG GTAGCTTCTG CCGATGAACG CGCCGATGAA GTGGTGATCG 882 0 

GCGTCGGCCC TGCCTTCGAT AAACACCAGC ATCACACTCT GATCGATATG CCCCATGGCG 88 80 

CGATCCTCAA AGAGCTGATT GCCGGGGTGG AAGAAGAGGG GCTTCACGCC CGGGTGGTGC 8 940 

GCATTCTGCG CACGTCCGAC GTCTCCTTTA TGGCCTGGGA TGCGGCCAAC CTGAGCGGCT 9000 

CGGGGATCGG CATCGGTATC CAGTCGAAGG GGACCACGGT CATCCATCAG CGCGATCTGC 9060 

TGCCGCTCAG CAACCTGGAG CTGTTCTCCC AGGCGCCGCT GCTGACGCTG GAGACCTACC 9120 

GGCAGATTGG CAAAAACGCT GCGCGCTATG CGCGCAAAGA GTCACCTTCG CCGGTGCCGG 9180 

TGGTGAACGA TCAGATGGTG CGGCCGAAAT TTATGGCCAA AGCCGCGCTA TTTCATATCA 9240 



AAGAGACCAA ACATGTGGTG CAGGACGCCG AGCCCGTCAC CCTGCACATC GACTTAGTAA 9300 

GGGAGTGACC ATGAGCGAGA AAACCATGCG CGTGCAGGAT TATCCGTTAG CCACCCGCTG 9360 

CCCGGAGCAT ATCCTGACGC CTACCGGCAA ACCATTGACC GATATTACCC TCGAGAAGGT 9420 

GCTCTCTGGC GAGGTGGGCC CGCAGGATGT GCGGATCTCC CGCCAGACCC TTGAGTACCA 9480 

GGCGCAGATT GCCGAGCAGA TGCAGCGCCA TGCGGTGGCG CGCAATTTCC GCCGCGCGGC 9540 

GGAGCTTATC GCCATTCCTG ACGAGCGCAT TCTGGCTATC TATAACGCGC TGCGCCCGTT 9600 

CCGCTCCTCG CAGGCGGAGC TGCTGGCGAT CGCCGACGAG CTGGAGCACA CCTGGCATGC 9660 

GACAGTGAAT GCCGCCTTTG TCCGGGAGTC GGCGGAAGTG TATCAGCAGC GGCATAAGCT 9720 

GCGTAAAGGA AGCTAAGCGG AGGTCAGCAT GCCGTTAATA GCCGGGATTG ATATCGGCAA 9780 

CGCCACCACC GAGGTGGCGC TGGCGTCCGA CTACCCGCAG GCGAGGGCGT TTGTTGCCAG 9840 

CGGGATCGTC GCGACGACGG GCATGAAAGG GACGCGGGAC AATATCGCCG GGACCCTCGC 9900 

CGCGCTGGAG CAGGCCCTGG CGAAAACACC GTGGTCGATG AGCGATGTCT CTCGCATCTA 9960 

TCTTAACGAA GCCGCGCCGG TGATTGGCGA TGTGGCGATG GAGACCATCA CCGAGACCAT 10 020 

TATCACCGAA TCGACCATGA TCGGTCATAA CCCGCAGACG CCGGGCGGGG TGGGCGTTGG 10080 

CGTGGGGACG ACTATCGCCC TCGGGCGGCT GGCGACGCTG CCGGCGGCGC AGTATGCCGA 1014 0 

GGGGTGGATC GTACTGATTG ACGACGCCGT CGATTTCCTT GACGCCGTGT GGTGGCTCAA 10200 

TGAGGCGCTC GACCGGGGGA TCAACGTGGT GGCGGCGATC CTCAAAAAGG ACGACGGCGT 10260 

GCTGGTGAAC AACCGCCTGC GTAAAACCCT GCCGGTGGTG GATGAAGTGA CGCTGCTGGA 10320 

GCAGGTCCCC GAGGGGGTAA TGGCGGCGGT GGAAGTGGCC GCGCCGGGCC AGGTGGTGCG 1038 0 

GATCCTGTCG AATCCCTACG GGATCGCCAC CTTCTTCGGG CTAAGCCCGG AAGAGACC C A 1044 0 

GGCCATCGTC CCCATCGCCC GCGCCCTGAT TGGCAACCGT TCCGCGGTGG TGCTCAAGAC 10500 

CCCGCAGGGG GATGTGCAGT CGCGGGTGAT CCCGGCGGGC AACCTCTACA TTAGCGGCGA 10560 

AAAGCGCCGC GGAGAGGCCG ATGTCGCCGA GGGCGCGGAA GCCATCATGC AGGCGATGAG 1062 0 

CGCCTGCGCT CCGGTACGCG ACATCCGCGG CGAACCGGGC ACCCACGCCG GCGGCATGCT 1068 0 

TGAGCGGGTG CGCAAGGTAA TGGCGTCCCT GACCGGCCAT GAGATGAGCG CGATATACAT 10740 

CCAGGATCTG CTGGCGGTGG ATACGTTTAT TCCGCGCAAG GTGCAGGGCG GGATGGCCGG 10800 

CGAGTGCGCC ATGGAGAATG CCGTCGGGAT GGCGGCGATG GTGAAAGCGG ATCGTCTGCA 1086 0 

AATGCAGGTT ATCGCCCGCG AACTGAGCGC CCGACTGCAG ACCGAGGTGG TGGTGGGCGG 10920 

CGTGGAGGCC AACATGGCCA TCGCCGGGGC GTTAACCACT CCCGGCTGTG CGGCGCCGCT 10980 



GGCGATCCTC GACCTCGGCG CCGGCTCGAC GGATGCGGCG ATCGTCAACG CGGAGGGGCA 11040 

GATAACGGCG GTCCATCTCG CCGGGGCGGG GAATATGGTC AGCCTGTTGA TTAAAACCGA 11100 

GCTGGGCCTC GAGGATCTTT CGCTGGCGGA AGCGATAAAA AAATACCCGC TGGCCAAAGT 11160 

GGAAAGCCTG TTCAGTATTC GTCACGAGAA TGGCGCGGTG GAGTTCTTTC GGGAAGCCCT 1122 0 

CAGCCCGGCG GTGTTCGCCA AAGTGGTGTA CATCAAGGAG GGCGAACTGG TGCCGATCGA 112 8 0 

TAACGCCAGC CCGCTGGAAA AAATTCGTCT CGTGCGCCGG CAGGCGAAAG AGAAAGTGTT 11340 

TGTCACCAAC TGCCTGCGCG CGCTGCGCCA GGTCTCACCC GGCGGTTCCA TTCGCGATAT 11400 

CGCCTTTGTG GTGCTGGTGG GCGGCTCATC GCTGGACTTT GAGATCCCGC AGCTTATCAC 11460 

GGAAGCCTTG TCGCACTATG GCGTGGTCGC CGGGCAGGGC AATATTCGGG GAACAGAAGG 11520 

GCCGCGCAAT GCGGTCGCCA CCGGGCTGCT ACTGGCCGGT CAGGCGAATT AAACGGGCGC 115 80 

TCGCGCCAGC CTCTCTCTTT AACGTGCTAT TTCAGGATGC CGATAATGAA CCAGACTTCT 11640 

ACCTTAACCG GGCAGTGCGT GGCCGAGTTT CTTGGCACCG GATTGCTCAT TTTCTTCGGC 11700 

GCGGGCTGCG TCGCTGCGCT GCGGGTCGCC GGGGCCAGCT TTGGTCAGTG GGAGATCAGT 11760 

ATTATCTGGG GCCTTGGCGT CGCCATGGCC ATCTACCTGA CGGCCGGTGT CTCCGGCGCG 1182 0 

CACCTAAATC CGGCGGTGAC CATTGCCCTG TGGCTGTTCG CCTGTTTTGA ACGCCGCAAG 11880 

GTGCTGCCGT TTATTGTTGC CCAGACGGCC GGGGCCTTCT GCGCCGCCGC GCTGGTGTAT 11940 

GGGCTCTATC GCCAGCTGTT TCTCGATCTT GAACAGAGTC AGCATATCGT GCGCGGCACT 12 000 

GCCGCCAGTC TTAACCTGGC CGGGGTCTTT TCCACGTACC CGCATCCACA TATCACTTTT 12 060 

ATACAAGCGT TTGCCGTGGA GACCACCATC ACGGCAATCC TGATGGCGAT GATCATGGCC 12120 

CTGACCGACG ACGGCAACGG AATTC 12145 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

AGCTTAGGAG TCTAGAATAT TGAGCTCGAA TTCCCGGGCA TGCGGTACCG GATC C AGAAA 60 

AAAGCCCGCA CCTGACAGTG CGGGCTTTTT TTTT 94 



(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
GGAATTCAGA TCTCAGCAAT GAGCGAGAAA ACCATGC 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
GCTCTAGATT AGCTTCCTTT ACGCAGC 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
GGCCAAGCTT AAGGAGGTTA ATTAAATGAA AAG 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
GCTCTAGATT ATTCAATGGT GTCGGG 
(2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
GCGCCGTCTA GAATTATGAG CTATCGTATG TTTGATTATC TG 42 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TCTGATACGG GATCCTCAGA ATGCCTGGCG GAAAAT 3 6 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GCGCGGATCC AGGAGTCTAG AATTATGGGA TTGACTACTA AACCTCTATC T 51 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GATACGCCCG GGTTACCATT TCAACAGATC GTCCTT 36 



(2) INFORMATION FOR SEQ ID NO: 29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
TCGACGAATT CAGGAGGA 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
CTAGTCCTCC TGAATTCG 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CTAGTAAGGA GGACAATTC 

(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
CATGGAATTG TCCTCCTTA 

(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 2 71 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 
(A) ORGANISM: GPP1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Met Lys Arg Phe Asn Val Leu Lys Tyr lie Arg Thr Thr Lys Ala Asn 
15 10 15 

lie Gin Thr lie Ala Met Pro Leu Thr Thr Lys Pro Leu Ser Leu Lys 
20 25 30 

lie Asn Ala Ala Leu Phe Asp Val Asp Gly Thr lie lie lie Ser Gin 
35 40 45 

Pro Ala lie Ala Ala Phe Trp Arg Asp Phe Gly Lys Asp Lys Pro Tyr 
50 55 60 

Phe Asp Ala Glu His Val He His He Ser His Gly Trp Arg Thr Tyr 
65 70 75 80 

Asp Ala He Ala Lys Phe Ala Pro Asp Phe Ala Asp Glu Glu Tyr Val 
85 90 95 

Asn Lys Leu Glu Gly Glu He Pro Glu Lys Tyr Gly Glu His Ser He 
100 105 110 

Glu Val Pro Gly Ala Val Lys Leu Cys Asn Ala Leu Asn Ala Leu Pro 
115 120 125 

Lys Glu Lys Trp Ala Val Ala Thr Ser Gly Thr Arg Asp Met Ala Lys 
130 135 140 

Lys Trp Phe Asp He Leu Lys He Lys Arg Pro Glu Tyr Phe He Thr 
145 150 155 160 

Ala Asn Asp Val Lys Gin Gly Lys Pro His Pro Glu Pro Tyr Leu Lys 
165 170 175 

Gly Arg Asn Gly Leu Gly Phe Pro He Asn Glu Gin Asp Pro Ser Lys 
180 185 190 

Ser Lys Val Val Val Phe Glu Asp Ala Pro Ala Gly He Ala Ala Gly 
195 200 205 

Lys Ala Ala Gly Cys Lys He Val Gly He Ala Thr Thr Phe Asp Leu 
210 215 220 



Asp Phe Leu Lys Glu Lys Gly Cys Asp He He Val Lys Asn His Glu 
225 230 235 240 



Ser He Arg Val Gly Glu Tyr Asn Ala Glu Thr Asp Glu Val Glu Leu 
245 * 250 255 



He Phe Asp Asp Tyr Leu Tyr Ala Lys Asp Asp Leu Leu Lys Trp 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 555 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHABI 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Met Lys Arg Ser Lys Arg Phe Ala Val Leu Ala Gin Arg Pro Val Asn 
15 10 15 

Gin Asp Gly Leu He Gly Glu Trp Pro Glu Glu Gly Leu He Ala Met 
20 25 30 

Asp Ser Pro Phe Asp Pro Val Ser Ser Val Lys Val Asp Asn Gly Leu 
35 40 45 

He Val Glu Leu Asp Gly Lys Arg Arg Asp Gin Phe Asp Met He Asp 
50 55 60 

Arg Phe He Ala Asp Tyr Ala He Asn Val Glu Arg Thr Glu Gin Ala 
65 70 75 80 

Met Arg Leu Glu Ala Val Glu He Ala Arg Met Leu Val Asp He His 
85 90 95 

Val Ser Arg Glu Glu He He Ala He Thr Thr Ala He Thr Pro Ala 
100 105 110 

Lys Ala Val Glu Val Met Ala Gin Met Asn Val Val Glu Met Met Met 
115 120 125 

Ala Leu Gin Lys Met Arg Ala Arg Arg Thr Pro Ser Asn Gin Cys His 
130 135 140 

Val Thr Asn Leu Lys Asp Asn Pro Val Gin He Ala Ala Asp Ala Ala 
145 150 155 160 

Glu Ala Gly He Arg Gly Phe Ser Glu Gin Glu Thr Thr Val Gly He 
165 170 175 



Ala Arg Tyr Ala Pro Phe Asn Ala Leu Ala Leu Leu Val Gly Ser Gin 
180 185 190 



Cys Gly Arg Pro Gly Val Leu Thr Gin Cys Ser Val Glu Glu Ala Thr 
195 200 205 

Glu Leu Glu Leu Gly Met Arg Gly Leu Thr Ser Tyr Ala Glu Thr Val 
210 215 220 

Ser Val Tyr Gly Thr Glu Ala Val Phe Thr Asp Gly Asp Asp Thr Pro 
225 230 235 240 

Trp Ser Lys Ala Phe Leu Ala Ser Ala Tyr Ala Ser Arg Gly Leu Lys 
245 250 255 

Met Arg Tyr Thr Ser Gly Thr Gly Ser Glu Ala Leu Met Gly Tyr Ser 
260 265 270 

Glu Ser Lys Ser Met Leu Tyr Leu Glu Ser Arg Cys lie Phe lie Thr 
275 280 285 

Lys Gly Ala Gly Val Gin Gly Leu Gin Asn Gly Ala Val Ser Cys lie 
290 295 300 

Gly Met Thr Gly Ala Val Pro Ser Gly He Arg Ala Val Leu Ala Glu 
305 310 315 320 

Asn Leu He Ala Ser Met Leu Asp Leu Glu Val Ala Ser Ala Asn Asp 
325 330 335 

Gin Thr Phe Ser His Ser Asp He Arg Arg Thr Ala Arg Thr Leu Met 
340 345 350 

Gin Met Leu Pro Gly Thr Asp Phe He Phe Ser Gly Tyr Ser Ala Val 
355 360 365 

Pro Asn Tyr Asp Asn Met Phe Ala Gly Ser Asn Phe Asp Ala Glu Asp 
370 375 380 

Phe Asp Asp Tyr Asn He Leu Gin Arg Asp Leu Met Val Asp Gly Gly 
385 390 395 400 

Leu Arg Pro Val Thr Glu Ala Glu Thr He Ala He Arg Gin Lys Ala 
405 410 415 

Ala Arg Ala He Gin Ala Val Phe Arg Glu Leu Gly Leu Pro Pro He 
420 425 430 

Ala Asp Glu Glu Val Glu Ala Ala Thr Tyr Ala His Gly Ser Asn Glu 
435 440 445 

Met Pro Pro Arg Asn Val Val Glu Asp Leu Ser Ala Val Glu Glu Met 
450 455 460 

Met Lys Arg Asn He Thr Gly Leu Asp He Val Gly Ala Leu Ser Arg 
465 470 475 480 



Ser Gly Phe Glu Asp He Ala Ser Asn He Leu Asn Met Leu Arg Gin 
485 490 495 



Arg Val Thr Gly Asp Tyr Leu Gin Thr Ser Ala lie Leu Asp Arg Gin 
500 505 510 

Phe Glu Val Val Ser Ala Val Asn Asp lie Asn Asp Tyr Gin Gly Pro 
515 520 525 

Gly Thr Gly Tyr Arg lie Ser Ala Glu Arg Trp Ala Glu lie Lys Asn 
530 535 540 

He Pro Gly Val Val Gin Pro Asp Thr He Glu 
545 550 555 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TO POLOG Y : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Gin Gin Thr Thr Gin He Gin Pro Ser Phe Thr Leu Lys Thr Arg 
15 10 15 

Glu Gly Gly Val Ala Ser Ala Asp Glu Arg Ala Asp Glu Val Val He 
20 25 30 

Gly Val Gly Pro Ala Phe Asp Lys His Gin His His Thr Leu He Asp 
35 40 45 

Met Pro His Gly Ala He Leu Lys Glu Leu He Ala Gly Val Glu Glu 
50 55 60 

Glu Gly Leu His Ala Arg Val Val Arg He Leu Arg Thr Ser Asp Val 
65 70 75 80 

Ser Phe Met Ala Trp Asp Ala Ala Asn Leu Ser Gly Ser Gly He Gly 
85 90 95 

He Gly He Gin Ser Lys Gly Thr Thr Val He His Gin Arg Asp Leu 
100 105 110 

Leu Pro Leu Ser Asn Leu Glu Leu Phe Ser Gin Ala Pro Leu Leu Thr 
115 120 125 

Leu Glu Thr Tyr Arg Gin He Gly Lys Asn Ala Ala Arg Tyr Ala Arg 
130 135 140 



Lys Glu Ser Pro Ser Pro Val Pro Val Val Asn Asp Gin Met Val Arg 
145 150 155 160 



Pro Lys Phe Met 

His Val Val Gin 
180 

Arg Glu 



Ala Lys Ala Ala 
165 

Asp Ala Glu Pro 



Leu Phe His lie 
170 

Val Thr Leu His 
185 



Lys Glu Thr Lys 
175 

lie Asp Leu Val 
190 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Ser Glu Lys Thr Met Arg Val Gin Asp Tyr Pro Leu Ala Thr Arg 
15 10 15 

Cys Pro Glu His lie Leu Thr Pro Thr Gly Lys Pro Leu Thr Asp lie 
20 25 30 

Thr Leu Glu Lys Val Leu Ser Gly Glu Val Gly Pro Gin Asp Val Arg 
35 40 45 

lie Ser Arg Gin Thr Leu Glu Tyr Gin Ala Gin lie Ala Glu Gin Met 
50 55 60 

Gin His Ala Val Ala Arg Asn Phe Arg Arg Ala Ala Glu Leu lie Ala 
65 70* 75 80 

lie Pro Asp Glu Arg lie Leu Ala lie Tyr Asn Ala Leu Arg Pro Phe 
85 90 95 

Arg Ser Ser Gin Ala Glu Leu Leu Ala lie Ala Asp Glu Leu Glu His 
100 105 110 

Thr Trp His Ala Thr Val Asn Ala Ala Phe Val Arg Glu Ser Ala Glu 
115 120 125 

Val Tyr Gin Gin Arg His Lys Leu Arg Lys Gly Ser 
130 135 140 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 



(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAT 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Met Ser Tyr Arg Met Phe Asp Tyr Leu Val Pro Asn Val Asn Phe Phe 
15 10 15 

Gly Pro Asn Ala lie Ser Val Val Gly Glu Arg Cys Gin Leu Leu Gly 
20 25 30 

Gly Lys Lys Ala Leu Leu Val Thr Asp Lys Gly Leu Arg Ala lie Lys 
35 40 45 

Asp Gly Ala Val Asp Lys Thr Leu His Tyr Leu Arg Glu Ala Gly lie 
50 55 60 

Glu Val Ala lie Phe Asp Gly Val Glu Pro Asn Pro Lys Asp Thr Asn 
65 70 75 80 

Val Arg Asp Gly Leu Ala Val Phe Arg Arg Glu Gin Cys Asp lie lie 
85 90 95 

Val Thr Val Gly Gly Gly Ser Pro His Asp Cys Gly Lys Gly lie Gly 
100 105 110 

lie Ala Ala Thr His Glu Gly Asp Leu Tyr Gin Tyr Ala Gly lie Glu 
115 120 125 

Thr Leu Thr Asn Pro Leu Pro Pro lie Val Ala Val Asn Thr Thr Ala 
130 135 140 

Gly Thr Ala Ser Glu Val Thr Arg His Cys Val Leu Thr Asn Thr Glu 
145 150 155 160 

Thr Lys Val Lys Phe Val lie Val Ser Trp Arg Lys Leu Pro Ser Val 
165 170 175 

Ser lie Asn Asp Pro Leu Leu Met lie Gly Lys Pro Ala Ala Leu Thr 
180 185 190 

Ala Ala Thr Gly Met Asp Ala Leu Thr His Ala Val Glu Ala Tyr He 
195 200 205 

Ser Lys Asp Ala Asn Pro Val Thr Asp Ala Ala Ala Met Gin Ala He 
210 215 220 

Arg Leu He Ala Arg Asn Leu Arg Gin Ala Val Ala Leu Gly Ser Asn 
225 230 235 240 



Leu Gin Ala Arg Glu Asn Met Ala Tyr Ala Ser Leu Leu Ala Gly Met 
245 250 255 



Ala Phe Asn Asn Ala Asn Leu Gly Tyr Val His Ala Met Ala His Gin 
260 265 270 

Leu Gly Gly Leu Tyr Asp Met Pro His Gly Val Ala Asn Ala Val Leu 
275 280 285 

Leu Pro His Val Ala Arg Tyr Asn Leu lie Ala Asn Pro Glu Lys Phe 
290 295 300 

Ala Asp lie Ala Glu Leu Met Gly Glu Asn lie Thr Gly Leu Ser Thr 
305 310 315 320 

Leu Asp Ala Ala Glu Lys Ala lie Ala Ala lie Thr Arg Leu Ser Met 
325 330 335 

Asp lie Gly lie Pro Gin His Leu Arg Asp Leu Gly Val Lys Glu Ala 
340 345 350 

Asp Phe Pro Tyr Met Ala Glu Met Ala Leu Lys Asp Gly Asn Ala Phe 
355 360 365 

Ser Asn Pro Arg Lys Gly Asn Glu Gin Glu lie Ala Ala lie Phe Arg 
370 375 380 

Gin Ala Phe 
385 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
GCGAATTCAT GAGCTATCGT ATGTTTG 2 7 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GCGAATTCAG AATGCCTGGC GGAAAATC 2 8 



(2) INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
GGGAATTCAT GAGCGAGAAA ACCATGCG 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
GCGAATTCTT AGCTTCCTTT ACGCAGC 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 
GCGAATTCAT GCAACAGACA ACCCAAATTC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
GCGAATTCAC TCCCTTACTA AGTCG 
(2) INFORMATION FOR SEQ ID NO: 44: 



(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 

GGGAATTCAT GAAAAGATCA AAACGATTTG 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 
GCGAATTCTT ATTCAATGGT GTCGGGCTG 



(2) INFORMATION FOR SEQ ID NO: 46 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) , TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
TTGATAATAT AACCATGGCT GCTGCTGCTG ATAG 
(2) INFORMATION FOR SEQ ID NO: 47 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 
GTATGATATG TT AT CTTGG A TCCAATAAAT CTAATCTTC 
(2) INFORMATION FOR SEQ ID NO: 48: 



(i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
CATGACTAGT AAGGAGGACA ATTC 24 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
CATGGAATTG TCCTCCTTAC TAGT 24 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGCTTAGGAG TCTAGAATAT TGAGCTCGAA TTCCCGGGCA TGCGGTACCG GATCCAGAAA 
AAAGCCCGCA CCTGACAGTG CGGGCTTTTT TTTT 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
GGAATTCAGA TCTCAGCAAT GAGCGAGAAA ACCATGC 
(2) INFORMATION FOR SEQ ID NO: 52: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
GCTCTAGATT AGCTTCCTTT ACGCAGC 
(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
GGCCAAGCTT AAGGAGGTTA ATTAAATGAA AAG 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 54 : 
GCTCTAGATT ATTCAATGGT GTCGGG 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GCGCCGTCTA GAATTATGAG CTATCGTATG TTTGATTATC TG 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 
TCTGATACGG GATCCTCAGA ATGCCTGGCG GAAAAT 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 



TCGACGAATT CAGGAGGA 



(2) INFORMATION FOR SEQ ID NO: 58: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
CTAGTCCTCC TGAATTCG 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 607 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

Met Pro Leu He Ala Gly He Asp He Gly Asn Ala Thr Thr Glu Val 
15 10 15 



Ala Leu Ala Ser Asp Tyr Pro Gin Ala Arg Ala Phe Val Ala Ser Gly 
20 25 30 

He Val Ala Thr Thr Gly Met Lys Gly Thr Arg Asp Asn He Ala Gly 

35 40 45 



Thr Leu Ala Ala Leu Glu Gin Ala Leu Ala Lys Thr Pro Trp Ser Met 
50 55 60 



Ser Asp Val Ser Arg He Tyr Leu Asn Glu Ala Ala Pro Val He Gly 
65 70 75 80 



Asp Val Ala Met Glu Thr He Thr Glu Thr He He Thr Glu Ser Thr 
85 90 95 



Met He Gly His Asn Pro Gin Thr Pro Gly Gly Val Gly Val Gly Val 
100 105 110 

Gly Thr Thr He Ala Leu Gly Arg Leu Ala Thr Leu Pro Ala Ala Gin 
115 120 125 

Tyr Ala Glu Gly Trp He Val Leu He Asp Asp Ala Val Asp Phe Leu 
130 135 140 

Asp Ala Val Trp Trp Leu Asn Glu Ala Leu Asp Arg Gly He Asn Val 
145 150 155 160 

Val Ala Ala He Leu Lys Lys Asp Asp Gly Val Leu Val Asn Asn Arg 
165 170 175 

Leu Arg Lys Thr Leu Pro Val Val Asp Glu Val Thr Leu Leu Glu Gin 
180 185 190 

Val Pro Glu Gly Val Met Ala Ala Val Glu Val Ala Ala Pro Gly Gin 
195 200 205 



Val Val Arg 
210 



He Leu Ser Asn Pro Tyr Gly He Ala Thr Phe Phe Gly 
215 220 



Leu Ser Pro Glu Glu Thr Gin Ala He Val Pro He Ala Arg Ala Leu 
225 230 235 240 

He Gly Asn Arg Ser Ala Val Val Leu Lys Thr Pro Gin Gly Asp Val 
245 250 255 

Gin Ser Arg Val He Pro Ala Gly Asn Leu Tyr He Ser Gly Glu Lys 
260 265 270 



Arg Arg Gly 
275 



Glu Ala Asp Val Ala Glu Gly Ala Glu Ala He Met Gin 
280 285 



Ala Met Ser Ala Cys Ala Pro Val Arg Asp He Arg Gly Glu Pro Gly 
290 295 300 

Thr His Ala Gly Gly Met Leu Glu Arg Val Arg Lys Val Met Ala Ser 
305 310 315 320 

Leu Thr Gly His Glu Met Ser Ala He Tyr He Gin Asp Leu Leu Ala 
325 330 335 

Val Asp Thr Phe He Pro Arg Lys Val Gin Gly Gly Met Ala Gly Glu 
340 345 350 

Cys Ala Met Glu Asn Ala Val Gly Met Ala Ala Met Val Lys Ala Asp 
355 360 365 



Arg Leu Gin 
370 



Met Gin Val He Ala Arg Glu Leu Ser Ala Arg Leu Gin 
375 380 



Thr Glu Val Val Val Gly Gly Val Glu Ala Asn Met Ala He Ala Gly 
385 390 395 400 



Ala Leu Thr Thr Pro Gly Cys Ala 
405 

Gly Ala Gly Ser Thr Asp Ala Ala 
420 

Thr Ala Val His Leu Ala Gly Ala 
435 440 

Lys Thr Glu Leu Gly Leu Glu Asp 
450 455 

Lys Tyr Pro Leu Ala Lys Val Glu 
465 470 



Ala Pro Leu Ala He Leu Asp Leu 
410 415 

He Val Asn Ala Glu Gly Gin He 
425 430 

Gly Asn Met Val Ser Leu Leu He 
445 

Leu Ser Leu Ala Glu Ala He Lys 
460 

Ser Leu Phe Ser He Arg His Glu 
475 480 



Asn Gly Ala Val Glu Phe Phe Arg Glu Ala Leu Ser Pro Ala Val Phe 
485 490 495 

Ala Lys Val Val Tyr He Lys Glu Gly Glu Leu Val Pro He Asp Asn 
500 505 510 

Ala Ser Pro Leu Glu Lys He Arg Leu Val Arg Arg Gin Ala Lys Glu 
515 520 525 

Lys Val Phe Val Thr Asn Cys Leu Arg Ala Leu Arg Gin Val Ser Pro 
530 535 540 

Gly Gly Ser He Arg Asp He Ala Phe Val Val Leu Val Gly Gly Ser 
545 550 555 560 

Ser Leu Asp Phe Glu He Pro Gin Leu He Thr Glu Ala Leu Ser His 
565 570 575 

Tyr Gly Val Val Ala Gly Gin Gly Asn He Arg Gly Thr Glu Gly Pro 
580 585 590 

Arg Asn Ala Val Ala Thr Gly Leu Leu Leu Ala Gly Gin Ala Asn 
595 600 605 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 142 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 



Met Asn Lys Ser Gin Gin He Ala Thr He Thr Leu Ala Ala Ala Lys 
15 10 15 



Lys Met Ala Gin Ala Val Glu Ala Lys Ala Leu Glu lie Asn Val Pro 
20 25 30 



Val Val Phe Ser Val Val Asp His Gly Gly Asn Thr Leu Leu Met Gin 
35 40 45 

Arg Met Asp Asp Ala Phe Val Thr Ser Cys Asp lie Ser Leu Asn Lys 
50 55 60 

Ala Tyr Thr Ala Cys Cys Leu Arg Gin Gly Thr His Glu lie Thr Asp 
65 70 75 80 

Ala Val Gin Pro Gly Ala Ser Leu Tyr Gly Leu Gin Leu Thr Asn Gin 
85 90 95 

Gin Arg He Val He Phe Gly Gly Gly Leu Pro Val He Leu Asn Gly 
100 105 110 

Lys Val He Gly Ala Val Gly Val Ser Gly Gly Thr Val Glu Gin Asp 
115 120 125 

Arg Leu Leu Ala Glu Thr Ala Leu Asp Cys Phe Ser Glu Leu 
130 135 140 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 143 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Met Met Asn Lys Ser Gin Gin Val Gin Thr He Thr Leu Ala Ala Ala 
15 10 15 

Gin Gin Met Ala Ala Ala Val Glu Lys Lys Ala Thr Glu He Asn Val 
20 25 30 

Ala Val Val Phe Ser Val Val Asp Arg Gly Gly Asn Thr Leu Leu He 
35 40 45 

Gin Arg Met Asp Glu Ala Phe Val Ser Ser Cys Asp He Ser Leu Asn 
50 55 60. 

Lys Ala Trp Ser Ala Cys Ser Leu Lys Gin Gly Thr His Glu He Thr 
65 70 75 80 

Ser Ala Val Gin Pro Gly Gin Ser Leu Tyr Gly Leu Gin Leu Thr Asn 
85 90 95 

Gin Gin Arg He He He Phe Gly Gly Gly Leu Pro Val He Phe Asn 
100 105 110 



Glu Gin Val He Gly Ala Val Gly Val Ser Gly Gly Thr Val Glu Gin 
115 120 125 

Asp Gin Leu Leu Ala Gin Cys Ala Leu Asp Cys Phe Ser Ala Leu 
130 135 140 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Met Ser Leu Ser Ser Pro Gly Val His Leu Phe Tyr His Ser Arg Trp 
15 10 15 

Gin Gly Thr Arg Val Leu Asp Glu Leu Cys Trp Gly Leu Glu Glu Gin 
20 25 30 

Gly Val Pro Cys Arg Ala He Cys Cys Asp Asp His Asp Cys Ala Leu 
35 40 45 

Ala Leu Gly Lys Leu Ala Ala Lys Ser Ser Thr Leu Arg Val Gly Leu 
50 55 60 

Gly Leu Asn Ala Thr Gly Asp He Ala Leu Thr His Ala Gin Leu Pro 
65 70 75 80 

Glu Asp Arg Ala Leu Val Cys Gly His Thr Arg Ala Gly Thr Ala Gin 
85 90 95 

He Arg Thr Leu Gly Ala Asn Ala Gly Gin Leu Val Lys Val Leu Pro 
100 105 110 

Phe Ser Glu He Lys 
115 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 



Met Ser Leu Ser Pro Pro Gly Val Arg Leu Phe Tyr Asp Pro Arg Gly 
1 5 10 15 



His His Ala Gly Ala lie Asn Glu Leu Cys Trp Gly Leu Glu Glu Gin 
20 25 30 



Gly Val Pro Cys Gin Thr lie Thr Tyr Asp Gly Gly Gly Asp Ala Ala 
35 40 45 

Ala Leu Gly Ala Leu Ala Ala Arg Ser Ser Pro Leu Arg Val Gly lie 
50 55 60 

Gly Leu Ser Ala Ser Gly Glu lie Ala Leu Thr His Ala Gin Leu Pro 
65 70 75 80 

Ala Asp Ala Pro Leu Ala Thr Gly His Val Thr Asp Ser Asp Asp Gin 
85 90 95 

Leu Arg Thr Leu Gly Ala Asn Ala Gly Gin Leu Val Lys Val Leu Pro 
100 105 110 

Leu Ser Glu Arg Asn 
115 

(2) INFORMATION FOR SEQ ID NO : 64 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 176 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 64 : 

Met Tyr Arg lie Tyr Thr Arg Thr Gly Asp Lys Gly Thr Thr Ala Leu 
15 10 15 

Tyr Gly Gly Ser Arg lie Glu Lys Asp His lie Arg Val Glu Ala Tyr 
20 25 30 

Gly Thr Val Asp Glu Leu lie Ser Gin Leu Gly Val Cys Tyr Ala Thr 
35 40 45 

Thr Arg Asp Ala Gly Leu Arg Glu Ser Leu His His lie Gin Gin Thr 
50 55 60 

Leu Phe Val Leu Gly Ala Glu Leu Ala Ser Asp Ala Arg Gly Leu Thr 
65 70 75 80 

Arg Leu Ser Gin Thr lie Gly Glu Glu Glu lie Thr Ala Leu Glu Arg 
85 90 95 

Leu He Asp Arg Asn Met Ala Glu Ser Gly Pro Leu Lys Gin Phe Val 
100 105 110 



He Pro Gly Arg Asn Leu Ala Ser Ala Gin Leu His Val Ala Arg Thr 
115 120 125 



Gin Ser Arg Arg Leu Glu Arg Leu Leu Thr Ala Met Asp Arg Ala His 
130 135 140 



Pro Leu Arg Asp Ala Leu Lys Arg Tyr Ser Asn Arg Leu Ser Asp Ala 
145 150 155 160 

Leu Phe Ser Met Ala Arg lie Glu Glu Thr Arg Pro Asp Ala Cys Ala 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

Met Tyr Arg lie Tyr Thr Arg Thr Gly Asp Asn Gly Thr Thr Ala Leu 
15 10 15 

Phe Gly Gly Ser Arg lie Asp Lys Asp Asp lie Arg Val Glu Ala Tyr 
20 25 30 

Gly Thr Val Asp Glu Leu lie Ser Gin Leu Gly Val Cys Tyr Ala Ser 
35 40 45 

Thr Arg Gin Ala Glu Leu Arg Gin Glu Leu His Ala Met Gin Lys Met 
50 55 60 

Leu Phe Val Leu Gly Ala Glu Leu Ala Ser Asp Gin Lys Gly Leu Thr 
65 70 75 80 

Arg Leu Lys Gin Arg lie Gly Glu Glu Asp lie Gin Ala Leu Glu Gin 
85 90 95 

Leu lie Asp Arg Asn Met Ala Gin Ser Gly Pro Leu Lys Glu Phe Val 
100 105 110 

lie Pro Gly Lys Asn Leu Ala Ser Ala Gin Leu His Val Ala Arg Thr 
115 120 125 

Leu Thr Arg Arg Leu Glu Arg lie Leu lie Ala Met Gly Arg Thr Leu 
130 135 140 

Thr Leu Arg Asp Glu Ala Arg Arg Tyr lie Asn Arg Leu Ser Asp Ala 
145 150 155 160 

Leu Phe Ser Met Ala Arg lie Glu Glu Thr Thr Pro Asp Val Cys Ala 
165 170 175 



(2) INFORMATION FOR SEQ ID NO: 66: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1830 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

ATGCGCTATA TCGCTGGCAT TGATATTGGC AACTCCTCGA CAGAAGTCGC CCTGGCGACG 60 

GTCGATGACG CAGGTGTGCT GAACATTCGC CACAGCGCGT TGGCTGAAAC CACGGGTATA 12 0 

AAAGGCACAT TACGAAATGT GTTCGGTATC CAGGAGGCGC TAACGCAGGC GGCAAAAGCG 180 

GCCGGCATTC AGCTCAGCGA TATTTCGCTT ATTCGCATTA ACGAAGCCAC GCCGGTCATT 240 

GGCGATGTGG CGATGGAAAC CATCACGGAA AC CAT CATC A CCGAGTCCAC CATGATCGGC 30 0 

CATAACCCGA AGACACCCGG CGGCGTCGGA CTGGGGGTCG G CATC AC CAT CACACCAGAG 360 

GCGCTGCTGT CCTGCTCCGC GGACACTCCC TATATTCTGG TGGTCTCCTC GGCCTTTGAC 42 0 

TTTGCCGATG TCGCCGCGAT GGTCAATGCG GCAACGGCAG CGGGCTATCA GATAACCGGC 4 80 

ATTATTTTGC AGCAGGATGA CGGCGTGCTG GTCAATAACC GGCTACAGCA ACCGCTACCG 540 

GTGATCGACG AAGTTCAGCA TATCGACCGG ATTCCACTTG GCATGCTGGC GGCCGTCGAG 60 0 

GTCGCTTTAC CCGGTAAGAT CATCGAAACG CTCTCCAACC CTTACGGTAT TGCGACCGTT 660 

TTCGATCTCA ACGCCGAGGA GAGCCAAAAT ATCGTGCCAA TGGCACGGGC GCTGATTGGC 72 0 

AACCGCTCGG CCGTGGTGGT GAAAACCCCC TCCGGCGACG TCAAGGCCCG CGCTATTCCG 78 0 

GCAGGTAATC TGTTGCTCAT CGCTCAGGGG CGCAGCGTAC AGGTTGATGT GGCCGCCGGG 84 0 

GCGGAAGCCA TCATGAAAGC GGTTGACGGC TGCGGCAAAC TGGACAACGT CGCGGGAGAA 90 0 

GCGGGCACCA ATATCGGCGG CATGCTAGAG CACGTGCGCC AGACCATGGC GGAGCTTACC 960 

AATAAGCCAG CTCAGGAGAT CCGCATTCAG GATCTGCTGG CCGTTGATAC GGCGGTGCCA 102 0 

GTCAGCGTGA CCGGCGGTCT TGCGGGGGAG TTCTCGCTGG AGCAGGCGGT GGGTATCGCC 108 0 

TCGATGGTCA AGTCGGATCG CCTGCAGATG GCCCTCATCG CCCGTGAAAT TGAGCACAAA 114 0 

CTGCAGATTG CGGTTCAGGT GGGCGGCGCC GAAGCGGAGG CGGCCATTCT TGGGGCGCTC 1200 

ACCACTCCCG GCACCACGCG CCCGCTGGCG ATCCTCGATC TGGGCGCCGG GTCGACCGAC 12 60 

GCCTCCATTA TCAATGCGCA GGGAGAGATC AGCGCCACTC ACCTGGCCGG CGCCGGCGAT 132 0 

ATGGTCACGA TGATCATCGC CCGCGAGCTG GGGCTTGAGG ACCGCTACCT GGCGGAAGAG 1380 

ATCAAAAAAT ATCCGCTGGC AAAAGTCGAA AGCCTGTTTC ATCTGCGTCA TGAAGACGGC 1440 



AGCGTCCAGT TTTTTCCGTC GGCCTTACCA CCGACGGTAT TTGCCCGCGT CTGCGTGAAA 1500 

CCGGATGAAC TGGTTCCCCT GCCCGGCGAT CTGCCGCTGG AGAAAGTGCG CGCAATTCGC 1560 

CGTAGCGCCA AATCACGCGT CTTTGTCACC AACGCCCTGC GAGCGTTACG CCAGGTGAGC 162 0 

CCTACCGGCA ACATTCGCGA CATCCCGTTC GTGGTGCTGG TGGGCGGCTC GTCCCTCGAT 1680 

TTCGAGATCC CCCAGCTGGT CACCGACGCG CTGGCGCACT ACCGGCTGGT TGCCGGGCGC 1740 

GGCAACATCC GCGGCTGTGA AGGCCCACGC AATGCGGTCG CCAGCGGATT ACTCCTTTCC 1800 

TGGCAAAAAG GAGGCACACA TGGAGAGTAG 183 0 
(2) INFORMATION FOR SEQ ID NO:67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : None 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Met Arg Tyr lie Ala Gly lie Asp lie Gly Asn Ser Ser Thr Glu Val 
15 10 15 

Ala Leu Ala Thr Val Asp Asp Ala Gly Val Leu Asn lie Arg His Ser 
20 25 30 

Ala Leu Ala Glu Thr Thr Gly lie Lys Gly Thr Leu Arg Asn Val Phe 
35 40 45 

Gly He Gin Glu Ala Leu Thr Gin Ala Ala Lys Ala Ala Gly He Gin 
50 55 60 

Leu Ser Asp He Ser Leu He Arg He Asn Glu Ala Thr Pro Val He 
65 70 75 80 

Gly Asp Val Ala Met Glu Thr He Thr Glu Thr He He Thr Glu Ser 
85 90 95 

Thr Met He Gly His Asn Pro Lys Thr Pro Gly Gly Val Gly Leu Gly 
100 105 110 

Val Gly He Thr He Thr Pro Glu Ala Leu Leu Ser Cys Ser Ala Asp 
115 120 125 

Thr Pro Tyr He Leu Val Val Ser Ser Ala Phe Asp Phe Ala Asp Val 
130 135 140 

Ala Ala Met Val Asn Ala Ala Thr Ala Ala Gly Tyr Gin He Thr Gly 
145 150 155 160 

He He Leu Gin Gin Asp Asp Gly Val Leu Val Asn Asn Arg Leu Gin 



165 170 175 

Gin Pro Leu Pro Val lie Asp Glu Val Gin His lie Asp Arg lie Pro 
180 185 190 

Leu Gly Met Leu Ala Ala Val Glu Val Ala Leu Pro Gly Lys He He 
195 200 205 

Glu Thr Leu Ser Asn Pro Tyr Gly He Ala Thr Val Phe Asp Leu Asn 
210 215 220 

Ala Glu Glu Ser Gin Asn He Val Pro Met Ala Arg Ala Leu He Gly 
225 230 235 240 

Asn Arg Ser Ala Val Val Val Lys Thr Pro Ser Gly Asp Val Lys Ala 
245 250 255 

Arg Ala He Pro Ala Gly Asn Leu Leu Leu He Ala Gin Gly Arg Ser 
260 265 270 

Val Gin Val Asp Val Ala Ala Gly Ala Glu Ala He Met Lys Ala Val 
275 280 285 

Asp Gly Cys Gly Lys Leu Asp Asn Val Ala Gly Glu Ala Gly Thr Asn 
290 295 300 

He Gly Gly Met Leu Glu His Val Arg Gin Thr Met Ala Glu Leu Thr 
305 310 315 320 

Asn Lys Pro Ala Gin Glu He Arg He Gin Asp Leu Leu Ala Val Asp 
325 330 335 

Thr Ala Val Pro Val Ser Val Thr Gly Gly Leu Ala Gly Glu Phe Ser 
340 345 350 

Leu Glu Gin Ala Val Gly He Ala Ser Met Val Lys Ser Asp Arg Leu 
355 360 365 

Gin Met Ala Leu He Ala Arg Glu He Glu His Lys Leu Gin He Ala 
370 375 380 

Val Gin Val Gly Gly Ala Glu Ala Glu Ala Ala He Leu Gly Ala Leu 
385 390 395 400 

Thr Thr Pro Gly Thr Thr Arg Pro Leu Ala He Leu Asp Leu Gly Ala 
405 410 415 

Gly Ser Thr Asp Ala Ser He He Asn Ala Gin Gly Glu He Ser Ala 
420 425 430 

Thr His Leu Ala Gly Ala Gly Asp Met Val Thr Met He He Ala Arg 
435 440 445 

Glu Leu Gly Leu Glu Asp Arg Tyr Leu Ala Glu Glu He Lys Lys Tyr 
450 455 460 

Pro Leu Ala Lys Val Glu Ser Leu Phe His Leu Arg His Glu Asp Gly 



465 470 475 480 

Ser Val Gin Phe Phe Pro Ser Ala Leu Pro Pro Thr Val Phe Ala Arg 
485 490 495 

Val Cys Val Lys Pro Asp Glu Leu Val Pro Leu Pro Gly Asp Leu Pro 
500 505 510 

Leu Glu Lys Val Arg Ala lie Arg Arg Ser Ala Lys Ser Arg Val Phe 
515 520 525 

Val Thr Asn Ala Leu Arg Ala Leu Arg Gin Val Ser Pro Thr Gly Asn 
530 535 540 

lie Arg Asp lie Pro Phe Val Val Leu Val Gly Gly Ser Ser Leu Asp 
545 550 555 560 

Phe Glu lie Pro Gin Leu Val Thr Asp Ala Leu Ala His Tyr Arg Leu 
565 570 575 

Val Ala Gly Arg Gly Asn lie Arg Gly Cys Glu Gly Pro Arg Asn Ala 
580 585 590 

Val Ala Ser Gly Leu Leu Leu Ser Trp Gin Lys Gly Gly Thr His Gly 
595 600 605 

Glu 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1824 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 

ATGCCGTTAA TAGCCGGGAT TGATATCGGC AACGCCACCA CCGAGGTGGC GCTGGCGTCC 60 

GACTACCCGC AGGCGAGGGC GTTTGTTGCC AGCGGGATCG TCGCGACGAC GGGCATGAAA 12 0 

GGGACGCGGG ACAATATCGC CGGGACCCTC GCCGCGCTGG AGCAGGCCCT GGCGAAAACA 180 

CCGTGGTCGA TGAGCGATGT CTCTCGCATC TATCTTAACG AAGCCGCGCC GGTGATTGGC 240 

GATGTGGCGA TGGAGACCAT CACCGAGACC ATTATCACCG AATCGACCAT GATCGGTCAT 300 

AACCCGCAGA CGCCGGGCGG GGTGGGCGTT GGCGTGGGGA CGACTATCGC CCTCGGGCGG 3 60 

CTGGCGACGC TGCCGGCGGC GCAGTATGCC GAGGGGTGGA TCGTACTGAT TGACGACGCC 42 0 

GTCGATTTCC TTGACGCCGT GTGGTGGCTC AATGAGGCGC TCGACCGGGG GATCAACGTG 4 80 

GTGGCGGCGA TCCTCAAAAA GGACGACGGC GTGCTGGTGA ACAACCGCCT GCGTAAAACC 54 0 



CTGCCGGTGG TGGATGAAGT GACGCTGCTG GAGCAGGTCC CCGAGGGGGT AATGGCGGCG 600 

GTGGAAGTGG CCGCGCCGGG CCAGGTGGTG CGGATCCTGT CGAATCCCTA CGGGATCGCC 660 

ACCTTCTTCG GGCTAAGCCC GGAAGAGACC CAGGCCATCG TCCCCATCGC CCGCGCCCTG 720 

ATTGGCAACC GTTCCGCGGT GGTGCTCAAG ACCCCGCAGG GGGATGTGCA GTCGCGGGTG 7 80 

ATCCCGGCGG GCAACCTCTA CATTAGCGGC GAAAAGCGCC GCGGAGAGGC CGATGTCGCC 84 0 

GAGGGCGCGG AAGCCATCAT GCAGGCGATG AGCGCCTGCG CTCCGGTACG CGACATCCGC 900 

GGCGAACCGG GCACCCACGC CGGCGGCATG CTTGAGCGGG TGCGCAAGGT AATGGCGTCC 960 

CTGACCGGCC ATGAGATGAG CGCGATATAC ATCCAGGATC TGCTGGCGGT GGATACGTTT 1020 

ATTCCGCGCA AGGTGCAGGG CGGGATGGCC GGCGAGTGCG CCATGGAGAA TGCCGTCGGG 10 80 

ATGGCGGCGA TGGTGAAAGC GGATCGTCTG CAAATGCAGG TTATCGCCCG CGAACTGAGC 114 0 

GCCCGACTGC AGACCGAGGT GGTGGTGGGC GGCGTGGAGG CCAACATGGC CATCGCCGGG 12 00 

GCGTTAACCA CTCCCGGCTG TGCGGCGCCG CTGGCGATCC TCGACCTCGG CGCCGGCTCG 12 60 

ACGGATGCGG CGATCGTCAA CGCGGAGGGG CAGATAACGG CGGTCCATCT CGCCGGGGCG 1320 

GGGAATATGG TCAGCCTGTT GATTAAAACC GAGCTGGGCC TCGAGGATCT TTCGCTGGCG 13 80 

GAAGCGATAA AAAAATACCC GCTGGCCAAA GTGGAAAGCC TGTTCAGTAT TCGTCACGAG 144 0 

AATGGCGCGG TGGAGTTCTT TCGGGAAGCC CTCAGCCCGG CGGTGTTCGC CAAAGTGGTG 1500 

TACATCAAGG AGGGCGAACT GGTGCCGATC GATAACGCCA GCCCGCTGGA AAAAATTCGT 1560 

CTCGTGCGCC GGCAGGCGAA AGAGAAAGTG TTTGTCACCA ACTGCCTGCG CGCGCTGCGC 162 0 

CAGGTCTCAC CCGGCGGTTC CATTCGCGAT ATCGCCTTTG TGGTGCTGGT GGGCGGCTCA 1680 

TCGCTGGACT TTGAGATCCC GCAGCTTATC ACGGAAGCCT TGTCGCACTA TGGCGTGGTC 174 0 

GCCGGGCAGG GCAATATTCG GGGAACAGAA GGGCCGCGCA ATGCGGTCGC CACCGGGCTG 1800 

CTACTGGCCG GTCAGGCGAA TTAA 1824 



